Re: 'umount -f /mnt/foo' fails if server IP is gone.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Oct 17, 2013, at 1:35 PM, Ben Greear <greearb@xxxxxxxxxxxxxxx> wrote:

> On 10/15/2013 11:29 AM, Ben Greear wrote:
>> Is 'umount -f' supposed to always work, even if the file server
>> goes away?
>> 
>> I have a user's system that just hangs forever in this case.
>> 
>> Could be local changes we have made, but I'm curious about
>> the expected behaviour before I go digging too deep...
> 
> Any input on this?  I don't mind trying to fix it, but I
> would like to know how it is supposed to work.

Recent kernels emit a GETATTR at umount time.  It is probably this operation that is stuck.


> Older kernels do not hang (we tried 3.0.x), but I'm not sure
> exactly where the problem started.
> 
> Test case was to set up NFSv3 mount, then pull the Ethernet cable
> on the nfs client machine.  This system is running 3.9.11+ kernel.
> 
> From /proc/mounts:
> 
> 10.2.46.90:/nfs_export on /mnt/lf/nfs3-001 type nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.2.46.90,mountvers=3,mountport=19408,mountproto=udp,srcaddr=10.2.46.91,local_lock=none,addr=10.2.46.90)
> 
> # umount /mnt/lf/nfs3-001
> ^C
> # umount -f /mnt/lf/nfs3-001
> [hangs forever it seems, certainly for a long time]
> 
> 
> Here is a stack trace of hung processes, for instance:
> 
> Oct 17 10:24:18 localhost kernel: [688601.930366] SysRq : Show Blocked State
> Oct 17 10:24:18 localhost kernel: [688601.931016]   task                PC stack   pid father
> Oct 17 10:24:18 localhost kernel: [688601.931016] mkdir           D f1bf6700     0 16898  16831 0x00000082
> Oct 17 10:24:18 localhost kernel: [688601.931016]  f070bd8c 00000046 00000282 f1bf6700 f5b55a20 c0d7e400 f5b55a20 c0d7e400
> Oct 17 10:24:18 localhost kernel: [688601.931016]  c0d7e400 c0d7e400 c0d7e400 f79e9400 f5b55a20 f79e9400 f5b55a20 f58b19c0
> Oct 17 10:24:18 localhost kernel: [688601.931016]  f8dc4fd0 f070bd50 f0ce9924 f070bd50 f8ec6bff f070bd94 f8dbf9f7 ee91a138
> Oct 17 10:24:18 localhost kernel: [688601.931016] Call Trace:
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<f8ec6bff>] ? rpc_put_task+0xf/0x20 [sunrpc]
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<f8dbf9f7>] ? nfs_initiate_write+0xb7/0xe0 [nfs]
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c04a9f0e>] ? ktime_get_ts+0x3e/0x110
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c09cb133>] schedule+0x23/0x60
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c09cb1e6>] io_schedule+0x76/0xc0
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c051607d>] sleep_on_page+0xd/0x20
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c09c8d4d>] __wait_on_bit+0x4d/0x70
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c0516070>] ? __lock_page+0x90/0x90
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c0516301>] wait_on_page_bit+0x91/0xa0
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c0478710>] ? wake_atomic_t_function+0x50/0x50
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c05164cb>] filemap_fdatawait_range+0xcb/0x150
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c05166c7>] filemap_write_and_wait_range+0x97/0xb0
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<f8db4074>] nfs_file_fsync+0x44/0xa0 [nfs]
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<f8db4030>] ? nfs_file_fsync_commit+0xb0/0xb0 [nfs]
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c058e1f9>] vfs_fsync_range+0x59/0x70
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c058e237>] vfs_fsync+0x27/0x30
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<f8db4b0b>] nfs_file_flush+0x6b/0x90 [nfs]
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c05631a1>] filp_close+0x31/0x80
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c057ea55>] put_files_struct+0x85/0xe0
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c057eaf7>] exit_files+0x47/0x60
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c045b83c>] do_exit+0x25c/0x980
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c056a0be>] ? SyS_stat64+0x2e/0x40
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c045bf9e>] do_group_exit+0x3e/0xa0
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c045c018>] SyS_exit_group+0x18/0x20
> Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c09d370d>] sysenter_do_call+0x12/0x28
> Oct 17 10:24:18 localhost kernel: [688601.931016] umount.nfs      D f11c4900     0 17150  17149 0x00000080
> Oct 17 10:24:18 localhost kernel: [688602.225057]  f3955d00 00000082 efea0d8c f11c4900 f3955c8c c08d9f96 f104e700 c0d7e400
> Oct 17 10:24:18 localhost kernel: [688602.225057]  c0d7e400 c0d7e400 c0d7e400 efea0d8c efea0c80 f79db400 f104e700 c0c3e980
> Oct 17 10:24:18 localhost kernel: [688602.225057]  f3955cd0 f3955cb4 f3955e90 0000002c 0000005c 132df575 efea0d80 0000005c
> Oct 17 10:24:18 localhost kernel: [688602.225057] Call Trace:
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c08d9f96>] ? __kfree_skb+0x36/0x90
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c09cb133>] schedule+0x23/0x60
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8ec6edd>] rpc_wait_bit_killable+0x2d/0x70 [sunrpc]
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c09c8d4d>] __wait_on_bit+0x4d/0x70
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8ec6eb0>] ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8ec6eb0>] ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c09c8e1b>] out_of_line_wait_on_bit+0xab/0xc0
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c0478710>] ? wake_atomic_t_function+0x50/0x50
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8ec7f9e>] __rpc_execute+0x11e/0x290 [sunrpc]
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8ebf130>] ? rpcproc_decode_null+0x10/0x10 [sunrpc]
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8ebf130>] ? rpcproc_decode_null+0x10/0x10 [sunrpc]
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c047865f>] ? wake_up_bit+0x5f/0x70
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8ec814c>] rpc_execute+0x3c/0xa0 [sunrpc]
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8ec0f09>] rpc_run_task+0x59/0x70 [sunrpc]
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8ec1022>] rpc_call_sync+0x42/0xa0 [sunrpc]
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8e0b46c>] nfs3_rpc_wrapper.clone.0+0x5c/0xa0 [nfsv3]
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8e0c0d4>] nfs3_proc_getattr+0x34/0x40 [nfsv3]
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8db7397>] __nfs_revalidate_inode+0xc7/0x140 [nfs]
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8db743f>] nfs_revalidate_inode+0x2f/0x60 [nfs]
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8db14a8>] nfs_weak_revalidate+0x38/0x50 [nfs]
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c056fba8>] complete_walk+0xa8/0xf0
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c0571e53>] path_lookupat+0x63/0x690
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c05724ae>] filename_lookup+0x2e/0xc0
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c05733a3>] user_path_at_empty+0x43/0x80
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c0578b9e>] ? __d_free+0x2e/0x50
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c064450c>] ? security_capable+0x1c/0x30
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c05733ff>] user_path_at+0x1f/0x30
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c05807c3>] SyS_umount+0x83/0x380
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c04d2606>] ? __audit_syscall_exit+0x1f6/0x290
> Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c09d370d>] sysenter_do_call+0x12/0x28
> 
> ....
> 
> Oct 17 10:24:42 localhost kernel: [688631.186190] INFO: task mkdir:16898 blocked for more than 180 seconds.
> Oct 17 10:24:42 localhost kernel: [688631.195666] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Oct 17 10:24:42 localhost kernel: [688631.206304] mkdir           D f1bf6700     0 16898  16831 0x00000082
> Oct 17 10:24:42 localhost kernel: [688631.215220]  f070bd8c 00000046 00000282 f1bf6700 f5b55a20 c0d7e400 f5b55a20 c0d7e400
> Oct 17 10:24:42 localhost kernel: [688631.225933]  c0d7e400 c0d7e400 c0d7e400 f79e9400 f5b55a20 f79e9400 f5b55a20 f58b19c0
> Oct 17 10:24:42 localhost kernel: [688631.236712]  f8dc4fd0 f070bd50 f0ce9924 f070bd50 f8ec6bff f070bd94 f8dbf9f7 ee91a138
> Oct 17 10:24:42 localhost kernel: [688631.247550] Call Trace:
> Oct 17 10:24:42 localhost kernel: [688631.252746]  [<f8ec6bff>] ? rpc_put_task+0xf/0x20 [sunrpc]
> Oct 17 10:24:42 localhost kernel: [688631.261369]  [<f8dbf9f7>] ? nfs_initiate_write+0xb7/0xe0 [nfs]
> Oct 17 10:24:42 localhost kernel: [688631.270065]  [<c04a9f0e>] ? ktime_get_ts+0x3e/0x110
> Oct 17 10:24:42 localhost kernel: [688631.277724]  [<c09cb133>] schedule+0x23/0x60
> Oct 17 10:24:42 localhost kernel: [688631.285298]  [<c09cb1e6>] io_schedule+0x76/0xc0
> Oct 17 10:24:42 localhost kernel: [688631.292738]  [<c051607d>] sleep_on_page+0xd/0x20
> Oct 17 10:24:42 localhost kernel: [688631.300316]  [<c09c8d4d>] __wait_on_bit+0x4d/0x70
> Oct 17 10:24:42 localhost kernel: [688631.308117]  [<c0516070>] ? __lock_page+0x90/0x90
> Oct 17 10:24:42 localhost kernel: [688631.315731]  [<c0516301>] wait_on_page_bit+0x91/0xa0
> Oct 17 10:24:42 localhost kernel: [688631.323630]  [<c0478710>] ? wake_atomic_t_function+0x50/0x50
> Oct 17 10:24:42 localhost kernel: [688631.332536]  [<c05164cb>] filemap_fdatawait_range+0xcb/0x150
> Oct 17 10:24:42 localhost kernel: [688631.341221]  [<c05166c7>] filemap_write_and_wait_range+0x97/0xb0
> Oct 17 10:24:42 localhost kernel: [688631.350224]  [<f8db4074>] nfs_file_fsync+0x44/0xa0 [nfs]
> Oct 17 10:24:42 localhost kernel: [688631.358569]  [<f8db4030>] ? nfs_file_fsync_commit+0xb0/0xb0 [nfs]
> Oct 17 10:24:42 localhost kernel: [688631.367764]  [<c058e1f9>] vfs_fsync_range+0x59/0x70
> Oct 17 10:24:42 localhost kernel: [688631.375818]  [<c058e237>] vfs_fsync+0x27/0x30
> Oct 17 10:24:42 localhost kernel: [688631.383346]  [<f8db4b0b>] nfs_file_flush+0x6b/0x90 [nfs]
> Oct 17 10:24:42 localhost kernel: [688631.392117]  [<c05631a1>] filp_close+0x31/0x80
> Oct 17 10:24:42 localhost kernel: [688631.399741]  [<c057ea55>] put_files_struct+0x85/0xe0
> Oct 17 10:24:42 localhost kernel: [688631.407871]  [<c057eaf7>] exit_files+0x47/0x60
> Oct 17 10:24:42 localhost kernel: [688631.415535]  [<c045b83c>] do_exit+0x25c/0x980
> Oct 17 10:24:42 localhost kernel: [688631.423133]  [<c056a0be>] ? SyS_stat64+0x2e/0x40
> Oct 17 10:24:42 localhost kernel: [688631.431078]  [<c045bf9e>] do_group_exit+0x3e/0xa0
> Oct 17 10:24:42 localhost kernel: [688631.439103]  [<c045c018>] SyS_exit_group+0x18/0x20
> Oct 17 10:24:42 localhost kernel: [688631.447169]  [<c09d370d>] sysenter_do_call+0x12/0x28
> Oct 17 10:24:54 localhost kernel: [688643.517069] RPC: AUTH_GSS upcall timed out.
> 
> 
> Thanks,
> Ben
> 
> 
> -- 
> Ben Greear <greearb@xxxxxxxxxxxxxxx>
> Candela Technologies Inc  http://www.candelatech.com
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux