On 22.04.19 г. 23:37 ч., Nathan Dehnel wrote: > I have a raid10 volume that frequently locks up when I try to write to > it or delete things. Any command that touches it will hang (and can't > be killed) and I have to start a new ssh session to get into the > computer again. Nothing fixes it besides a reboot, and the volume will > fail to unmount while the computer is shutting down. > > [ 302.360912] sysrq: SysRq : Show Blocked State > [ 302.360951] task PC stack pid father > [ 302.360987] btrfs-transacti D 0 2187 2 0x80000000 > [ 302.360993] Call Trace: > [ 302.361007] ? __schedule+0x59d/0x5f1 > [ 302.361012] schedule+0x6a/0x85 > [ 302.361019] btrfs_commit_transaction+0x219/0x7ac > [ 302.361027] ? wait_woken+0x6d/0x6d > [ 302.361031] transaction_kthread+0xc9/0x135 > [ 302.361036] ? btrfs_cleanup_transaction+0x4c7/0x4c7 > [ 302.361041] kthread+0x115/0x11d > [ 302.361046] ? kthread_park+0x76/0x76 > [ 302.361050] ret_from_fork+0x35/0x40 BTRFS is waiting to commit its transaction > [ 302.361064] nfsd D 0 2292 2 0x80000000 > [ 302.361067] Call Trace: > [ 302.361072] ? __schedule+0x59d/0x5f1 > [ 302.361077] schedule+0x6a/0x85 > [ 302.361120] wait_current_trans+0x9b/0xd8 > [ 302.361126] ? wait_woken+0x6d/0x6d > [ 302.361131] start_transaction+0x1ae/0x38e > [ 302.361135] btrfs_create+0x59/0x1d0 > [ 302.361142] vfs_create+0xbf/0xef > [ 302.361160] do_nfsd_create+0x2be/0x41d [nfsd] > [ 302.361214] nfsd4_open+0x223/0x578 [nfsd] > [ 302.361229] nfsd4_proc_compound+0x44a/0x562 [nfsd] > [ 302.361240] nfsd_dispatch+0xb9/0x16e [nfsd] > [ 302.361258] svc_process+0x524/0x6e2 [sunrpc] > [ 302.361270] ? nfsd_destroy+0x5f/0x5f [nfsd] > [ 302.361278] nfsd+0xf9/0x150 [nfsd] > [ 302.361284] kthread+0x115/0x11d > [ 302.361289] ? kthread_park+0x76/0x76 > [ 302.361292] ret_from_fork+0x35/0x40 Here it seems btrfs is exposed via NFS and a client requested a file to be created and it's waiting for current transaction to finish. > [ 302.361297] nfsd D 0 2293 2 0x80000000 > [ 302.361300] Call Trace: > [ 302.361305] ? __schedule+0x59d/0x5f1 > [ 302.361309] schedule+0x6a/0x85 > [ 302.361314] rwsem_down_write_failed+0x1af/0x210 > [ 302.361325] ? nfsd_permission+0xa3/0xe8 [nfsd] > [ 302.361330] call_rwsem_down_write_failed+0x13/0x20 > [ 302.361335] down_write+0x20/0x2e > [ 302.361345] nfsd_unlink+0xb1/0x16b [nfsd] > [ 302.361359] nfsd4_remove+0x4e/0x10a [nfsd] > [ 302.361371] nfsd4_proc_compound+0x44a/0x562 [nfsd] > [ 302.361381] nfsd_dispatch+0xb9/0x16e [nfsd] > [ 302.361395] svc_process+0x524/0x6e2 [sunrpc] > [ 302.361401] ? __mutex_unlock_slowpath.isra.6+0x1e8/0x20a > [ 302.361410] ? nfsd_destroy+0x5f/0x5f [nfsd] > [ 302.361419] nfsd+0xf9/0x150 [nfsd] > [ 302.361424] kthread+0x115/0x11d > [ 302.361428] ? kthread_park+0x76/0x76 > [ 302.361434] ret_from_fork+0x35/0x40 Here NFSD is waiting on a lock of its own, presumably acquired by PID 2292, which in turn is waiting for btrfs pid 2187 > [ 302.361441] rm D 0 2388 2334 0x00000004 > [ 302.361444] Call Trace: > [ 302.361449] ? __schedule+0x59d/0x5f1 > [ 302.361453] schedule+0x6a/0x85 > [ 302.361457] wait_current_trans+0x9b/0xd8 > [ 302.361462] ? wait_woken+0x6d/0x6d > [ 302.361466] start_transaction+0x1ae/0x38e > [ 302.361471] btrfs_start_transaction_fallback_global_rsv+0x32/0x127 > [ 302.361475] btrfs_unlink+0x30/0xc0 > [ 302.361478] vfs_unlink+0xd2/0x147 > [ 302.361482] do_unlinkat+0x112/0x223 > [ 302.361488] do_syscall_64+0x7e/0x133 > [ 302.361492] entry_SYSCALL_64_after_hwframe+0x44/0xa9 This rm is waiting again waiting for btrfs' current transaction to finish. > [ 302.361496] RIP: 0033:0x7f681509b5d7 > [ 302.361504] Code: Bad RIP value. > [ 302.361506] RSP: 002b:00007fffb1aed668 EFLAGS: 00000202 ORIG_RAX: > 0000000000000107 > [ 302.361510] RAX: ffffffffffffffda RBX: 000055672760c6c0 RCX: 00007f681509b5d7 > [ 302.361512] RDX: 0000000000000000 RSI: 000055672760b490 RDI: 00000000ffffff9c > [ 302.361514] RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000 > [ 302.361516] R10: fffffffffffff12b R11: 0000000000000202 R12: 00007fffb1aed848 > [ 302.361518] R13: 000055672760b400 R14: 0000000000000002 R15: 0000000000000000 > There isn't a lot to be done with the information you have provided. At the very least: 1. Provide backtrace of all threads on the system via "echo t > /proc/sysrq-trigger" 2. Provide source code line number of btrfs_commit_transaction+0x219/0x7ac . This can be done by executing the ./faddr2line[0] vmlinux btrfs_commit_transaction+0x219/0x7ac 3. State your kernel version Of course you will need the unstripped vmlinux image of your kernel. [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/faddr2line
