On Tue, Oct 30, 2018 at 05:14:42PM -0700, Omar Sandoval wrote:
>From: Omar Sandoval <osandov@xxxxxx>
>
>There's a race between close_ctree() and cleaner_kthread().
>close_ctree() sets btrfs_fs_closing(), and the cleaner stops when it
>sees it set, but this is racy; the cleaner might have already checked
>the bit and could be cleaning stuff. In particular, if it deletes unused
>block groups, it will create delayed iputs for the free space cache
>inodes. As of "btrfs: don't run delayed_iputs in commit", we're no
>longer running delayed iputs after a commit. Therefore, if the cleaner
>creates more delayed iputs after delayed iputs are run in
>btrfs_commit_super(), we will leak inodes on unmount and get a busy
>inode crash from the VFS.
>
>Fix it by parking the cleaner before we actually close anything. Then,
>any remaining delayed iputs will always be handled in
>btrfs_commit_super(). This also ensures that the commit in close_ctree()
>is really the last commit, so we can get rid of the commit in
>cleaner_kthread().
>
>Fixes: 30928e9baac2 ("btrfs: don't run delayed_iputs in commit")
>Signed-off-by: Omar Sandoval <osandov@xxxxxx>
>---
>We found this with a stress test that our containers team runs. I'm
>wondering if this same race could have caused any other issues other
>than this new iput thing, but I couldn't identify any.
I noticed an inode leak issue in generic/475, but whether dropping commit
30928e9baac2 ("btrfs: don't run delayed_iputs in commit") or applying
this patch, the issue still exists.
I have attached the dmesg.
>
> fs/btrfs/disk-io.c | 40 +++++++---------------------------------
> 1 file changed, 7 insertions(+), 33 deletions(-)
>
>diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>index b0ab41da91d1..7c17284ae3c2 100644
>--- a/fs/btrfs/disk-io.c
>+++ b/fs/btrfs/disk-io.c
>@@ -1664,9 +1664,8 @@ static int cleaner_kthread(void *arg)
> struct btrfs_root *root = arg;
> struct btrfs_fs_info *fs_info = root->fs_info;
> int again;
>- struct btrfs_trans_handle *trans;
>
>- do {
>+ while (1) {
> again = 0;
>
> /* Make the cleaner go to sleep early. */
>@@ -1715,42 +1714,16 @@ static int cleaner_kthread(void *arg)
> */
> btrfs_delete_unused_bgs(fs_info);
> sleep:
>+ if (kthread_should_park())
>+ kthread_parkme();
>+ if (kthread_should_stop())
>+ return 0;
> if (!again) {
> set_current_state(TASK_INTERRUPTIBLE);
>- if (!kthread_should_stop())
>- schedule();
>+ schedule();
> __set_current_state(TASK_RUNNING);
> }
>- } while (!kthread_should_stop());
>-
>- /*
>- * Transaction kthread is stopped before us and wakes us up.
>- * However we might have started a new transaction and COWed some
>- * tree blocks when deleting unused block groups for example. So
>- * make sure we commit the transaction we started to have a clean
>- * shutdown when evicting the btree inode - if it has dirty pages
>- * when we do the final iput() on it, eviction will trigger a
>- * writeback for it which will fail with null pointer dereferences
>- * since work queues and other resources were already released and
>- * destroyed by the time the iput/eviction/writeback is made.
>- */
>- trans = btrfs_attach_transaction(root);
>- if (IS_ERR(trans)) {
>- if (PTR_ERR(trans) != -ENOENT)
>- btrfs_err(fs_info,
>- "cleaner transaction attach returned %ld",
>- PTR_ERR(trans));
>- } else {
>- int ret;
>-
>- ret = btrfs_commit_transaction(trans);
>- if (ret)
>- btrfs_err(fs_info,
>- "cleaner open transaction commit returned %d",
>- ret);
> }
>-
>- return 0;
> }
>
> static int transaction_kthread(void *arg)
>@@ -3931,6 +3904,7 @@ void close_ctree(struct btrfs_fs_info *fs_info)
> int ret;
>
> set_bit(BTRFS_FS_CLOSING_START, &fs_info->flags);
>+ kthread_park(fs_info->cleaner_kthread);
Since we are not going to call kthread_unpark, I am not sure why
kthread_park is used instead of kthread_stop here. It looks like there
is no significant difference between stopping instantly and delayed stop.
--
Thanks,
Lu
>
> /* wait for the qgroup rescan worker to stop */
> btrfs_qgroup_wait_for_completion(fs_info, false);
>--
>2.19.1
>
>
>
>
[ 366.955193] run fstests generic/475 at 2018-10-31 15:06:43
[ 367.495791] BTRFS: device fsid 812f883c-40b2-4456-9769-b94ddf1cb07e devid 1 transid 5 /dev/nvme0n1p2
[ 367.624469] BTRFS info (device dm-3): disk space caching is enabled
[ 367.627305] BTRFS info (device dm-3): has skinny extents
[ 367.628771] BTRFS info (device dm-3): flagging fs with big metadata feature
[ 367.637585] BTRFS info (device dm-3): enabling ssd optimizations
[ 367.640079] BTRFS info (device dm-3): checking UUID tree
[ 367.666290] btrfs_dev_stat_print_on_error: 86 callbacks suppressed
[ 367.666293] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
[ 367.670665] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
[ 367.673332] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
[ 367.674024] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 4, rd 0, flush 0, corrupt 0, gen 0
[ 367.676072] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 5, rd 0, flush 0, corrupt 0, gen 0
[ 367.677316] Buffer I/O error on dev dm-3, logical block 4718576, async page read
[ 367.682713] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 6, rd 0, flush 0, corrupt 0, gen 0
[ 367.685150] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 7, rd 0, flush 0, corrupt 0, gen 0
[ 367.690053] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 8, rd 0, flush 0, corrupt 0, gen 0
[ 367.690298] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 9, rd 0, flush 0, corrupt 0, gen 0
[ 367.694166] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 10, rd 0, flush 0, corrupt 0, gen 0
[ 367.700484] BTRFS: error (device dm-3) in btrfs_commit_transaction:2239: errno=-5 IO failure (Error while writing out transaction)
[ 367.707185] BTRFS info (device dm-3): forced readonly
[ 367.708627] BTRFS warning (device dm-3): Skipping commit of aborted transaction.
[ 367.710526] BTRFS: error (device dm-3) in cleanup_transaction:1849: errno=-5 IO failure
[ 367.712330] BTRFS info (device dm-3): delayed_refs has NO entry
[ 367.975623] BTRFS error (device dm-3): cleaner transaction attach returned -30
[ 367.977998] BTRFS info (device dm-3): at unmount delalloc count 8192
[ 367.980424] WARNING: CPU: 2 PID: 14918 at fs/btrfs/disk-io.c:3846 btrfs_free_fs_root+0x94/0xa0 [btrfs]
[ 367.983228] Modules linked in: btrfs xor zstd_decompress zstd_compress xxhash raid6_pq efivarfs xfs nvme nvme_core
[ 367.986171] CPU: 2 PID: 14918 Comm: umount Kdump: loaded Not tainted 4.19.0-rc8+ #18
[ 367.988154] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[ 367.990175] RIP: 0010:btrfs_free_fs_root+0x94/0xa0 [btrfs]
[ 367.997036] RSP: 0018:ffffc9000744bd60 EFLAGS: 00010282
[ 367.998595] RAX: ffff880354eb6270 RBX: ffff8803661c4548 RCX: 0000000000000000
[ 368.000665] RDX: 0000000000000002 RSI: 0000000000000001 RDI: 0000000000000000
[ 368.002527] RBP: ffff88033cd70000 R08: 0000000000000000 R09: 0000000000000001
[ 368.004426] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9000744bd88
[ 368.006248] R13: ffff88033cd700c0 R14: ffffc9000744bd80 R15: 0000000000000000
[ 368.008082] FS: 00007f206293e780(0000) GS:ffff880373800000(0000) knlGS:0000000000000000
[ 368.010068] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 368.011654] CR2: 000055d225976b70 CR3: 000000033f992004 CR4: 00000000003606e0
[ 368.013440] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 368.015232] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 368.016979] Call Trace:
[ 368.017947] btrfs_free_fs_roots+0xcf/0x160 [btrfs]
[ 368.019317] ? wait_for_completion+0x126/0x150
[ 368.020617] close_ctree+0x149/0x330 [btrfs]
[ 368.021822] generic_shutdown_super+0x6c/0x100
[ 368.023043] kill_anon_super+0x14/0x30
[ 368.024203] btrfs_kill_super+0x13/0x100 [btrfs]
[ 368.025471] deactivate_locked_super+0x2f/0x70
[ 368.027647] cleanup_mnt+0x3b/0x70
[ 368.028712] task_work_run+0x9e/0xd0
[ 368.029875] do_syscall_64+0x470/0x600
[ 368.031009] ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 368.032350] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 368.033620] RIP: 0033:0x7f2062ec1b5b
[ 368.039127] RSP: 002b:00007ffd75ebd748 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 368.040906] RAX: 0000000000000000 RBX: 000055b8b9bdb2b0 RCX: 00007f2062ec1b5b
[ 368.042564] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055b8b9bdb490
[ 368.044250] RBP: 0000000000000000 R08: 000055b8b9bdb4b0 R09: 00007ffd75ebbfa0
[ 368.045862] R10: 0000000000000000 R11: 0000000000000246 R12: 000055b8b9bdb490
[ 368.047503] R13: 00007f20631e11a4 R14: 0000000000000000 R15: 00007ffd75ebd9b8
[ 368.049089] irq event stamp: 0
[ 368.049979] hardirqs last enabled at (0): [<0000000000000000>] (null)
[ 368.051674] hardirqs last disabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00
[ 368.053587] softirqs last enabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00
[ 368.055550] softirqs last disabled at (0): [<0000000000000000>] (null)
[ 368.058098] ---[ end trace 08fc3163d80dad81 ]---
[ 368.061134] WARNING: CPU: 2 PID: 14918 at fs/btrfs/extent-tree.c:123 btrfs_put_block_group+0x51/0x60 [btrfs]
[ 368.063637] Modules linked in: btrfs xor zstd_decompress zstd_compress xxhash raid6_pq efivarfs xfs nvme nvme_core
[ 368.066230] CPU: 2 PID: 14918 Comm: umount Kdump: loaded Tainted: G W 4.19.0-rc8+ #18
[ 368.068615] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[ 368.070406] RIP: 0010:btrfs_put_block_group+0x51/0x60 [btrfs]
[ 368.076431] RSP: 0018:ffffc9000744bda0 EFLAGS: 00010206
[ 368.077765] RAX: 0000000000000001 RBX: ffff88033cd70000 RCX: ffff8803414f76c0
[ 368.079477] RDX: ffff8803414f76c0 RSI: 0000000000000001 RDI: ffff8803414f74e8
[ 368.081141] RBP: ffff8803414f74e8 R08: 0000000000000000 R09: 0000000000000000
[ 368.082830] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88033cd70118
[ 368.084565] R13: ffff88033cd70168 R14: ffff8803414f7680 R15: dead000000000100
[ 368.087151] FS: 00007f206293e780(0000) GS:ffff880373800000(0000) knlGS:0000000000000000
[ 368.089087] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 368.090616] CR2: 000055d225976b70 CR3: 000000033f992004 CR4: 00000000003606e0
[ 368.092390] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 368.094103] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 368.095812] Call Trace:
[ 368.096700] btrfs_free_block_groups+0x1da/0x3b0 [btrfs]
[ 368.098065] close_ctree+0x171/0x330 [btrfs]
[ 368.099254] generic_shutdown_super+0x6c/0x100
[ 368.100486] kill_anon_super+0x14/0x30
[ 368.101554] btrfs_kill_super+0x13/0x100 [btrfs]
[ 368.102758] deactivate_locked_super+0x2f/0x70
[ 368.103986] cleanup_mnt+0x3b/0x70
[ 368.105002] task_work_run+0x9e/0xd0
[ 368.106038] do_syscall_64+0x470/0x600
[ 368.107123] ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 368.108397] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 368.109677] RIP: 0033:0x7f2062ec1b5b
[ 368.115270] RSP: 002b:00007ffd75ebd748 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 368.117866] RAX: 0000000000000000 RBX: 000055b8b9bdb2b0 RCX: 00007f2062ec1b5b
[ 368.119595] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055b8b9bdb490
[ 368.121368] RBP: 0000000000000000 R08: 000055b8b9bdb4b0 R09: 00007ffd75ebbfa0
[ 368.122983] R10: 0000000000000000 R11: 0000000000000246 R12: 000055b8b9bdb490
[ 368.124680] R13: 00007f20631e11a4 R14: 0000000000000000 R15: 00007ffd75ebd9b8
[ 368.126330] irq event stamp: 0
[ 368.127242] hardirqs last enabled at (0): [<0000000000000000>] (null)
[ 368.128874] hardirqs last disabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00
[ 368.130836] softirqs last enabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00
[ 368.132742] softirqs last disabled at (0): [<0000000000000000>] (null)
[ 368.134443] ---[ end trace 08fc3163d80dad82 ]---
[ 368.140638] WARNING: CPU: 2 PID: 14918 at fs/btrfs/extent-tree.c:9789 btrfs_free_block_groups+0x2af/0x3b0 [btrfs]
[ 368.143175] Modules linked in: btrfs xor zstd_decompress zstd_compress xxhash raid6_pq efivarfs xfs nvme nvme_core
[ 368.145923] CPU: 2 PID: 14918 Comm: umount Kdump: loaded Tainted: G W 4.19.0-rc8+ #18
[ 368.148908] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[ 368.150766] RIP: 0010:btrfs_free_block_groups+0x2af/0x3b0 [btrfs]
[ 368.156851] RSP: 0018:ffffc9000744bdb0 EFLAGS: 00010206
[ 368.158212] RAX: ffff8803414f4620 RBX: ffff88033cd70000 RCX: 0000000000000000
[ 368.159883] RDX: 0000000000000002 RSI: ffff88033cd71750 RDI: ffff8803414f7a38
[ 368.161614] RBP: ffff8803414f0558 R08: 0000000000000000 R09: ffff8803414f7b70
[ 368.163340] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88033cd70118
[ 368.165029] R13: ffff8803414f4620 R14: 0000000000000000 R15: dead000000000100
[ 368.166730] FS: 00007f206293e780(0000) GS:ffff880373800000(0000) knlGS:0000000000000000
[ 368.168613] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 368.170055] CR2: 000055d225976b70 CR3: 000000033f992004 CR4: 00000000003606e0
[ 368.171794] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 368.173501] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 368.175273] Call Trace:
[ 368.176969] close_ctree+0x171/0x330 [btrfs]
[ 368.178203] generic_shutdown_super+0x6c/0x100
[ 368.179443] kill_anon_super+0x14/0x30
[ 368.180560] btrfs_kill_super+0x13/0x100 [btrfs]
[ 368.181794] deactivate_locked_super+0x2f/0x70
[ 368.182975] cleanup_mnt+0x3b/0x70
[ 368.184024] task_work_run+0x9e/0xd0
[ 368.185051] do_syscall_64+0x470/0x600
[ 368.186098] ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 368.187392] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 368.188675] RIP: 0033:0x7f2062ec1b5b
[ 368.194299] RSP: 002b:00007ffd75ebd748 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 368.195990] RAX: 0000000000000000 RBX: 000055b8b9bdb2b0 RCX: 00007f2062ec1b5b
[ 368.197690] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055b8b9bdb490
[ 368.199348] RBP: 0000000000000000 R08: 000055b8b9bdb4b0 R09: 00007ffd75ebbfa0
[ 368.200998] R10: 0000000000000000 R11: 0000000000000246 R12: 000055b8b9bdb490
[ 368.202651] R13: 00007f20631e11a4 R14: 0000000000000000 R15: 00007ffd75ebd9b8
[ 368.204354] irq event stamp: 0
[ 368.205236] hardirqs last enabled at (0): [<0000000000000000>] (null)
[ 368.207825] hardirqs last disabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00
[ 368.209804] softirqs last enabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00
[ 368.211807] softirqs last disabled at (0): [<0000000000000000>] (null)
[ 368.213501] ---[ end trace 08fc3163d80dad83 ]---
[ 368.214836] BTRFS info (device dm-3): space_info 1 has 7708672 free, is not full
[ 368.216595] BTRFS info (device dm-3): space_info total=8388608, used=585728, pinned=0, reserved=86016, may_use=8192, readonly=0
[ 368.219751] WARNING: CPU: 2 PID: 14918 at fs/btrfs/extent-tree.c:9789 btrfs_free_block_groups+0x2af/0x3b0 [btrfs]
[ 368.222497] Modules linked in: btrfs xor zstd_decompress zstd_compress xxhash raid6_pq efivarfs xfs nvme nvme_core
[ 368.225120] CPU: 2 PID: 14918 Comm: umount Kdump: loaded Tainted: G W 4.19.0-rc8+ #18
[ 368.227581] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[ 368.229402] RIP: 0010:btrfs_free_block_groups+0x2af/0x3b0 [btrfs]
[ 368.235564] RSP: 0018:ffffc9000744bdb0 EFLAGS: 00010206
[ 368.237770] RAX: ffff8803414f7b40 RBX: ffff88033cd70000 RCX: 000000000017000c
[ 368.239647] RDX: 0000000000000002 RSI: 000000000017000c RDI: ffffffff8125fb74
[ 368.241447] RBP: ffff8803414f47d0 R08: 0000000000000000 R09: 0000000000000000
[ 368.243206] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8803414f4848
[ 368.245071] R13: ffff8803414f7b40 R14: 0000000000000000 R15: dead000000000100
[ 368.246825] FS: 00007f206293e780(0000) GS:ffff880373800000(0000) knlGS:0000000000000000
[ 368.248758] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 368.250245] CR2: 000055d225976b70 CR3: 000000033f992004 CR4: 00000000003606e0
[ 368.251982] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 368.253714] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 368.255491] Call Trace:
[ 368.256375] close_ctree+0x171/0x330 [btrfs]
[ 368.257584] generic_shutdown_super+0x6c/0x100
[ 368.258786] kill_anon_super+0x14/0x30
[ 368.259900] btrfs_kill_super+0x13/0x100 [btrfs]
[ 368.261180] deactivate_locked_super+0x2f/0x70
[ 368.262433] cleanup_mnt+0x3b/0x70
[ 368.263452] task_work_run+0x9e/0xd0
[ 368.264531] do_syscall_64+0x470/0x600
[ 368.265621] ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 368.267805] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 368.269288] RIP: 0033:0x7f2062ec1b5b
[ 368.275103] RSP: 002b:00007ffd75ebd748 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 368.276887] RAX: 0000000000000000 RBX: 000055b8b9bdb2b0 RCX: 00007f2062ec1b5b
[ 368.278598] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055b8b9bdb490
[ 368.280268] RBP: 0000000000000000 R08: 000055b8b9bdb4b0 R09: 00007ffd75ebbfa0
[ 368.281917] R10: 0000000000000000 R11: 0000000000000246 R12: 000055b8b9bdb490
[ 368.283580] R13: 00007f20631e11a4 R14: 0000000000000000 R15: 00007ffd75ebd9b8
[ 368.285271] irq event stamp: 0
[ 368.286151] hardirqs last enabled at (0): [<0000000000000000>] (null)
[ 368.287851] hardirqs last disabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00
[ 368.289798] softirqs last enabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00
[ 368.291756] softirqs last disabled at (0): [<0000000000000000>] (null)
[ 368.293419] ---[ end trace 08fc3163d80dad84 ]---
[ 368.294684] BTRFS info (device dm-3): space_info 4 has 275857408 free, is not full
[ 368.297278] BTRFS info (device dm-3): space_info total=276824064, used=114688, pinned=0, reserved=0, may_use=786432, readonly=65536
[ 368.300677] VFS: Busy inodes after unmount of dm-3. Self-destruct in 5 seconds. Have a nice day...