At 01/31/2017 08:15 AM, Philipp Kern wrote:
Hi,
my btrfs-based system (~2.5 TiB stored in the filesystem replicated onto
on two disks, running kernel 4.9.6-1-ARCH) locked up after I enabled
quotas and had a btrfs-size tool running. Now the question is how to
recover from that. Whenever I mount the filesystem I end up with
btrfs-cleaner and a kworker hanging:
[ 491.154603] INFO: task kworker/u128:3:105 blocked for more than 120 seconds.
[ 491.188559] Not tainted 4.9.6-1-ARCH #1
v4.9.x has a bad qgroup patch, which hugely slows anything related to
relocation.
v4.10-rcs introduced a fix for that, which should make the process a
little faster.
Would you please try v4.10-rc to see if it solves the problem?
Thanks,
Qu
[ 491.209443] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 491.247188] kworker/u128:3 D 0 105 2 0x00000000
[ 491.247208] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper [btrfs]
[ 491.247210] ffff880103bc8800 0000000000000000 ffff8801034ba7c0 ffff8801062580c0
[ 491.247213] ffff880105fe8d40 ffffc90000c63c30 ffffffff81605cdf ffff8801034ba7c0
[ 491.247215] 0000000000000001 ffff8801062580c0 ffffffff810aa490 ffff8801034ba7c0
[ 491.247217] Call Trace:
[ 491.247222] [<ffffffff81605cdf>] ? __schedule+0x22f/0x6e0
[ 491.247224] [<ffffffff810aa490>] ? wake_up_q+0x80/0x80
[ 491.247226] [<ffffffff816061cd>] schedule+0x3d/0x90
[ 491.247237] [<ffffffffa01d248e>] wait_current_trans.isra.8+0xbe/0x110 [btrfs]
[ 491.247240] [<ffffffff810c4200>] ? wake_atomic_t_function+0x60/0x60
[ 491.247249] [<ffffffffa01d4d1c>] start_transaction+0x2bc/0x4a0 [btrfs]
[ 491.247258] [<ffffffffa01d4f18>] btrfs_start_transaction+0x18/0x20 [btrfs]
[ 491.247267] [<ffffffffa02442ba>] btrfs_qgroup_rescan_worker+0x7a/0x610 [btrfs]
[ 491.247278] [<ffffffffa0209abd>] btrfs_scrubparity_helper+0x7d/0x350 [btrfs]
[ 491.247288] [<ffffffffa0209dde>] btrfs_qgroup_rescan_helper+0xe/0x10 [btrfs]
[ 491.247291] [<ffffffff81098a95>] process_one_work+0x1e5/0x470
[ 491.247292] [<ffffffff81098d68>] worker_thread+0x48/0x4e0
[ 491.247294] [<ffffffff81098d20>] ? process_one_work+0x470/0x470
[ 491.247296] [<ffffffff8109e8f9>] kthread+0xd9/0xf0
[ 491.247298] [<ffffffff8102d752>] ? __switch_to+0x2d2/0x630
[ 491.247299] [<ffffffff8109e820>] ? kthread_park+0x60/0x60
[ 491.247301] [<ffffffff8160a995>] ret_from_fork+0x25/0x30
[ 491.247306] INFO: task btrfs-cleaner:148 blocked for more than 120 seconds.
[ 491.280723] Not tainted 4.9.6-1-ARCH #1
[ 491.302026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 491.340471] btrfs-cleaner D 0 148 2 0x00000000
[ 491.340475] ffff880103bc8800 0000000000000000 ffff8801032acf80 ffff8801062580c0
[ 491.340478] ffff8801032a8d40 ffffc90000cc3cf0 ffffffff81605cdf ffff8801032acf80
[ 491.340480] 0000000000000001 ffff8801062580c0 ffffffff810aa490 ffff8801032acf80
[ 491.340482] Call Trace:
[ 491.340487] [<ffffffff81605cdf>] ? __schedule+0x22f/0x6e0
[ 491.340489] [<ffffffff810aa490>] ? wake_up_q+0x80/0x80
[ 491.340491] [<ffffffff816061cd>] schedule+0x3d/0x90
[ 491.340505] [<ffffffffa01d248e>] wait_current_trans.isra.8+0xbe/0x110 [btrfs]
[ 491.340508] [<ffffffff810c4200>] ? wake_atomic_t_function+0x60/0x60
[ 491.340517] [<ffffffffa01d4d1c>] start_transaction+0x2bc/0x4a0 [btrfs]
[ 491.340525] [<ffffffffa01d4f18>] btrfs_start_transaction+0x18/0x20 [btrfs]
[ 491.340534] [<ffffffffa01bb819>] btrfs_drop_snapshot+0x4e9/0x880 [btrfs]
[ 491.340542] [<ffffffffa01d3e7b>] btrfs_clean_one_deleted_snapshot+0xbb/0x110 [btrfs]
[ 491.340552] [<ffffffffa01ca7f1>] cleaner_kthread+0x141/0x1b0 [btrfs]
[ 491.340560] [<ffffffffa01ca6b0>] ? btrfs_destroy_pinned_extent+0x120/0x120 [btrfs]
[ 491.340562] [<ffffffff8109e8f9>] kthread+0xd9/0xf0
[ 491.340564] [<ffffffff8102d752>] ? __switch_to+0x2d2/0x630
[ 491.340565] [<ffffffff8109e820>] ? kthread_park+0x60/0x60
[ 491.340566] [<ffffffff8160a995>] ret_from_fork+0x25/0x30
Unfortunately whenever I try to execute a btrfs command against the
mounted filesystem -- e.g. to disable quota -- the command hangs. And
unfortunately that's in a shell without job control over a serial console.
Relevant output from ps:
105 0 0 DW [kworker/u128:3]
107 0 0 SW [kworker/u128:5]
111 0 0 SW< [bioset]
112 0 0 SW< [bioset]
113 0 0 SW< [bioset]
115 0 0 SW [kworker/1:2]
117 0 0 SW< [kworker/0:1H]
118 0 0 SW< [kworker/1:1H]
122 0 0 SW< [bioset]
123 0 6724 S sh -i
128 0 0 SW< [btrfs-worker]
129 0 0 SW< [kworker/u129:0]
130 0 0 SW< [btrfs-worker-hi]
131 0 0 SW< [btrfs-delalloc]
132 0 0 SW< [btrfs-flush_del]
133 0 0 SW< [btrfs-cache]
134 0 0 SW< [btrfs-submit]
135 0 0 SW< [btrfs-fixup]
136 0 0 SW< [btrfs-endio]
137 0 0 SW< [btrfs-endio-met]
138 0 0 SW< [btrfs-endio-met]
139 0 0 SW< [btrfs-endio-rai]
140 0 0 SW< [btrfs-endio-rep]
141 0 0 SW< [btrfs-rmw]
142 0 0 SW< [btrfs-endio-wri]
143 0 0 SW< [btrfs-freespace]
144 0 0 SW< [btrfs-delayed-m]
145 0 0 SW< [btrfs-readahead]
146 0 0 SW< [btrfs-qgroup-re]
147 0 0 SW< [btrfs-extent-re]
148 0 0 DW [btrfs-cleaner]
149 0 0 RW [btrfs-transacti]
So there's always a running btrfs-transaction. The kernel messages start
off like this:
[ 3.900674] BTRFS: device fsid e7ef324b-c81e-4ccf-941d-713b807ffab4 devid 1 transid 2030007 /dev/sdb2
[ 3.942600] BTRFS: device fsid e7ef324b-c81e-4ccf-941d-713b807ffab4 devid 2 transid 2030007 /dev/sda2
[ 14.569488] BTRFS info (device sda2): disk space caching is enabled
[ 14.569491] BTRFS info (device sda2): has skinny extents
[ 14.826782] random: crng init done
[ 30.738810] BTRFS info (device sda2): checking UUID tree
[ 62.916772] BTRFS info (device sda2): The free space cache file (880598319104) is invalid. skip it
[ 62.916772]
The actual disk traffic quiets down after a while, without any further
message printed into dmesg -- it'd be useful to know when it's done
checking the UUID tree.
Long story short: Is there a way for me to disable quotas again without
mounting the filesystem? Or a way to get btrfs to not spawn cleanup
tasks before I can disable quotas? I have many, many qgroups now because
of many snapshots created by snapper. Even if I try to touch these the
command hangs.
Kind regards and thanks
Philipp Kern
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html