On Thu, Apr 16, 2020 at 2:16 PM Filipe Manana <fdmanana@xxxxxxxxx> wrote:
>
> On Tue, Mar 24, 2020 at 2:48 PM Josef Bacik <josef@xxxxxxxxxxxxxx> wrote:
> >
> > Dave reported a problem where we were panicing with generic/475 with
> > misc-5.7. This is because we were doing IO after we had stopped all of
> > the worker threads, because we do the log tree cleanup on roots at drop
> > time. Cleaning up the log tree may need to do reads if we happened to
> > have evicted the blocks from memory.
>
> Here the "may need" is actually a "will always need", because before
> calling btrfs_free_fs_roots() at close_ctree(),
> we drop all the extent buffers from memory from the btree inode
> through the call to invalidate_inode_pages2().
>
> So this causes a use-after-free on the workqueues used for reads while
> traversing the log trees during the log dropping, since the workqueues
> were freed before right after invalidate_inode_pages2(),
> everytime we abort a transaction and we have at least one log root
> around that is big enough to not consist of only one leaf.
>
> >
> > Because of this simply add a helper to btrfs_cleanup_transaction() that
> > will go through and drop all of the log roots. This gets run before we
> > do the close_ctree() work, and thus we are allowed to do any reads that
> > we would need. I ran this through many iterations of generic/475 with
> > constrained memory and I did not see the issue.
> >
> > Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
>
> Fixes: 8c38938c7bb096 ("btrfs: move the root freeing stuff into btrfs_put_root")
> Reviewed-by: Filipe Manana <fdmanana@xxxxxxxx>
Also, a sample stack trace of when this happens would be good to have
in the changelog:
[24862.061099] general protection fault, probably for non-canonical
address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
[24862.063225] CPU: 2 PID: 12359 Comm: umount Tainted: G W
5.6.0-rc7-btrfs-next-58 #1
[24862.064892] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[24862.067078] RIP: 0010:btrfs_queue_work+0x33/0x1c0 [btrfs]
[24862.068085] Code: 41 55 41 54 55 53 48 89 f5 48 83 ec 08 48 8b 46
70 a8 04 0f 84 a3 00 00 00 4c 8b 67 08 4d 85 e4 0f 84 96 00 00 00 4c
89 65 68 <41> 83 7c 24 64 ff 74 06 f0 41 ff 44 24 58 48 83 7d 08 00 74
41 49
[24862.071550] RSP: 0018:ffff9cfb015937d8 EFLAGS: 00010246
[24862.072424] RAX: 0000000000000000 RBX: ffff8eb5e339ed80 RCX: 0000000000000000
[24862.073754] RDX: 0000000000000001 RSI: ffff8eb5eb33b770 RDI: ffff8eb5e37a0460
[24862.075084] RBP: ffff8eb5eb33b770 R08: 000000000000020c R09: ffffffff9fc09ac0
[24862.076439] R10: 0000000000000007 R11: 0000000000000000 R12: 6b6b6b6b6b6b6b6b
[24862.077769] R13: ffff9cfb00229040 R14: 0000000000000008 R15: ffff8eb5d3868000
[24862.079102] FS: 00007f167ea022c0(0000) GS:ffff8eb5fae00000(0000)
knlGS:0000000000000000
[24862.080620] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[24862.081696] CR2: 00007f167e5e0cb1 CR3: 0000000138c18004 CR4: 00000000003606e0
[24862.083049] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[24862.084396] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[24862.085739] Call Trace:
[24862.086250] btrfs_end_bio+0x81/0x130 [btrfs]
[24862.087025] __split_and_process_bio+0xaf/0x4e0 [dm_mod]
[24862.087980] ? percpu_counter_add_batch+0xa3/0x120
[24862.088831] dm_process_bio+0x98/0x290 [dm_mod]
[24862.089695] ? generic_make_request+0xfb/0x410
[24862.090523] dm_make_request+0x4d/0x120 [dm_mod]
[24862.091101] ? generic_make_request+0xfb/0x410
[24862.091893] generic_make_request+0x12a/0x410
[24862.092724] ? submit_bio+0x38/0x160
[24862.093403] submit_bio+0x38/0x160
[24862.094052] ? percpu_counter_add_batch+0xa3/0x120
[24862.094987] btrfs_map_bio+0x289/0x570 [btrfs]
[24862.095844] ? kmem_cache_alloc+0x24d/0x300
[24862.096668] btree_submit_bio_hook+0x79/0xc0 [btrfs]
[24862.097621] submit_one_bio+0x31/0x50 [btrfs]
[24862.098460] read_extent_buffer_pages+0x2fe/0x450 [btrfs]
[24862.099490] btree_read_extent_buffer_pages+0x7e/0x170 [btrfs]
[24862.100613] walk_down_log_tree+0x343/0x690 [btrfs]
[24862.101551] ? walk_log_tree+0x3d/0x380 [btrfs]
[24862.102423] walk_log_tree+0xf7/0x380 [btrfs]
[24862.103244] ? plist_requeue+0xf0/0xf0
[24862.103946] ? delete_node+0x4b/0x230
[24862.104669] free_log_tree+0x4c/0x130 [btrfs]
[24862.105519] ? wait_log_commit+0x140/0x140 [btrfs]
[24862.106434] btrfs_free_log+0x17/0x30 [btrfs]
[24862.107274] btrfs_drop_and_free_fs_root+0xb0/0xd0 [btrfs]
[24862.108025] btrfs_free_fs_roots+0x10c/0x190 [btrfs]
[24862.108673] ? do_raw_spin_unlock+0x49/0xc0
[24862.109197] ? _raw_spin_unlock+0x29/0x40
[24862.109708] ? release_extent_buffer+0x121/0x170 [btrfs]
[24862.110379] close_ctree+0x289/0x2e6 [btrfs]
[24862.110911] generic_shutdown_super+0x6c/0x110
[24862.111461] kill_anon_super+0xe/0x30
[24862.111926] btrfs_kill_super+0x12/0x20 [btrfs]
[24862.112498] deactivate_locked_super+0x3a/0x70
[24862.113043] cleanup_mnt+0xb4/0x150
[24862.113478] task_work_run+0x7e/0xc0
[24862.114108] exit_to_usermode_loop+0xfa/0x100
[24862.114924] do_syscall_64+0x20c/0x280
[24862.115622] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[24862.116557] RIP: 0033:0x7f167e2ecb37
[24862.117169] Code: 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 31 03 2b 00 f7 d8 64 89
01 48
[24862.120429] RSP: 002b:00007ffdd2cceb58 EFLAGS: 00000246 ORIG_RAX:
00000000000000a6
[24862.121729] RAX: 0000000000000000 RBX: 000055de5c568060 RCX: 00007f167e2ecb37
[24862.122969] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055de5c568240
[24862.124221] RBP: 000055de5c568240 R08: 000055de5c568270 R09: 0000000000000015
[24862.125449] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f167e7eee64
[24862.126659] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffdd2ccede0
[24862.127611] Modules linked in: btrfs dm_mod loop blake2b_generic
xor raid6_pq libcrc32c intel_rapl_msr intel_rapl_common kvm_intel kvm
irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel bochs_drm
aesni_intel crypto_simd drm_vram_helper drm_ttm_helper cryptd
glue_helper ttm drm_kms_helper joydev evdev drm sg pcspkr serio_raw
button qemu_fw_cfg parport_pc ppdev lp parport ip_tables x_tables
autofs4 ext4 crc32c_generic crc16 mbcache jbd2 sd_mod t10_pi
virtio_scsi ata_generic ata_piix crc32c_intel libata psmouse scsi_mod
virtio_pci e1000 virtio_ring virtio i2c_piix4 [last unloaded: btrfs]
[24862.134473] ---[ end trace 7ec1a70e82f52891 ]---
David, this is a regression introduced in 5.7-rc1, and it's very easy
to hit with generic/475.
Any reason this wasn't picked yet, despite having been posted several weeks ago?
Thanks.
>
> Thanks.
>
> >
> > ---
> > fs/btrfs/disk-io.c | 36 ++++++++++++++++++++++++++++++++----
> > 1 file changed, 32 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > index a6cb5cbbdb9f..d10c7be10f3b 100644
> > --- a/fs/btrfs/disk-io.c
> > +++ b/fs/btrfs/disk-io.c
> > @@ -2036,9 +2036,6 @@ void btrfs_free_fs_roots(struct btrfs_fs_info *fs_info)
> > for (i = 0; i < ret; i++)
> > btrfs_drop_and_free_fs_root(fs_info, gang[i]);
> > }
> > -
> > - if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state))
> > - btrfs_free_log_root_tree(NULL, fs_info);
> > }
> >
> > static void btrfs_init_scrub(struct btrfs_fs_info *fs_info)
> > @@ -3888,7 +3885,7 @@ void btrfs_drop_and_free_fs_root(struct btrfs_fs_info *fs_info,
> > spin_unlock(&fs_info->fs_roots_radix_lock);
> >
> > if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state)) {
> > - btrfs_free_log(NULL, root);
> > + ASSERT(root->log_root == NULL);
> > if (root->reloc_root) {
> > btrfs_put_root(root->reloc_root);
> > root->reloc_root = NULL;
> > @@ -4211,6 +4208,36 @@ static void btrfs_error_commit_super(struct btrfs_fs_info *fs_info)
> > up_write(&fs_info->cleanup_work_sem);
> > }
> >
> > +static void btrfs_drop_all_logs(struct btrfs_fs_info *fs_info)
> > +{
> > + struct btrfs_root *gang[8];
> > + u64 root_objectid = 0;
> > + int ret;
> > +
> > + spin_lock(&fs_info->fs_roots_radix_lock);
> > + while ((ret = radix_tree_gang_lookup(&fs_info->fs_roots_radix,
> > + (void **)gang, root_objectid,
> > + ARRAY_SIZE(gang))) != 0) {
> > + int i;
> > +
> > + for (i = 0; i < ret; i++)
> > + gang[i] = btrfs_grab_root(gang[i]);
> > + spin_unlock(&fs_info->fs_roots_radix_lock);
> > +
> > + for (i = 0; i < ret; i++) {
> > + if (!gang[i])
> > + continue;
> > + root_objectid = gang[i]->root_key.objectid;
> > + btrfs_free_log(NULL, gang[i]);
> > + btrfs_put_root(gang[i]);
> > + }
> > + root_objectid++;
> > + spin_lock(&fs_info->fs_roots_radix_lock);
> > + }
> > + spin_unlock(&fs_info->fs_roots_radix_lock);
> > + btrfs_free_log_root_tree(NULL, fs_info);
> > +}
> > +
> > static void btrfs_destroy_ordered_extents(struct btrfs_root *root)
> > {
> > struct btrfs_ordered_extent *ordered;
> > @@ -4603,6 +4630,7 @@ static int btrfs_cleanup_transaction(struct btrfs_fs_info *fs_info)
> > btrfs_destroy_delayed_inodes(fs_info);
> > btrfs_assert_delayed_root_empty(fs_info);
> > btrfs_destroy_all_delalloc_inodes(fs_info);
> > + btrfs_drop_all_logs(fs_info);
> > mutex_unlock(&fs_info->transaction_kthread_mutex);
> >
> > return 0;
> > --
> > 2.17.1
> >
>
>
> --
> Filipe David Manana,
>
> “Whether you think you can, or you think you can't — you're right.”
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”