杀掉由NFS造成Uninterruptible的进程
一台机器上mount了一很多nfs的分区,但是其中一个nfs server挂了(硬件问题一时启动不起来)。结果几个df进程就跟着挂起了,并且用kill -9也杀不掉。当时的进程状态是:
[jianingy(0)@xxxxxx ~]$ ps ax -o pid,wchan,s,command | grep df$
3505 rpc_ex D df
3844 rpc_ex D df
4162 rpc_ex D df
[jianingy(0)@xxxxxx ~]$ pstree
init─┬─acpid
├─agetty
├─atd
├─crond
├─dbus-daemon-1
├─3*[df]
...
所有df都wait在了rpc_execute。进程都在uninterruptible sleep(即ps的D状态),因此不会处理任何信号。经过广泛的搜索,发现rpc_execute所需数据由rpciod提供。因此只要killall -KILL rpciod就可以终止rpc_execute调用, 而rpciod在被杀掉后会自己重启过来。
另外出现这种情况大多因为使用默认方式mount了nfs, 这种情况下连接失败时nfs客户端会不停尝试连接服务器。在mount时使用intr选项可以避免这类问题的出现。下面贴一段nfs mount option的说明
soft If an NFS file operation has a major timeout then report an I/O error to the calling
program. The default is to continue retrying NFS file operations indefinitely.
hard If an NFS file operation has a major timeout then report "server not responding" on the
console and continue retrying indefinitely. This is the default.
intr If an NFS file operation has a major timeout and it is hard mounted, then allow signals
to interupt the file operation and cause it to return EINTR to the calling program.
The default is to not allow file operations to be interrupted.
0 TrackBacks
Listed below are links to blogs that reference this entry: 杀掉由NFS造成Uninterruptible的进程.
TrackBack URL for this entry: http://blog.jianingy.com:4080/mt-tb.cgi/222

Leave a comment