On Thu, Apr 25, 2013 at 07:18:59PM +0200, David Sterba wrote: > On Thu, Apr 18, 2013 at 04:42:18PM +0200, David Sterba wrote: > > [64394.422743] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078 > > [64394.426716] RIP: 0010:[<ffffffffa0010e0f>] [<ffffffffa0010e0f>] btrfs_search_slot+0xbf/0x9e0 [btrfs] > > [64394.426716] [<ffffffffa0014977>] btrfs_next_old_leaf+0x247/0x4e0 [btrfs] > > [64394.426716] [<ffffffffa0014c20>] btrfs_next_leaf+0x10/0x20 [btrfs] > > The bisection set was reduced to these patches (on top of > cmason/for-linus 4bc4bee45): > > 1 Btrfs: cleanup unused function Liu Bo > 2 btrfs: enhance superblock checks David Sterba > 3 Btrfs: add some free space cache tests Josef Bacik > 4 btrfs: merge save_error_info helpers into one David Sterba > 5 btrfs: clean up transaction abort messages David Sterba > 6 Btrfs: cleanup unused arguments of btrfs_csum_data Liu Bo > 7 Btrfs: use helper to cleanup tree roots Liu Bo > 8 Btrfs: add a incompatible format change for smaller metadata extent ref Josef Bacik > > The bisecting process points to patch 8, ie the 'first bad', but the > reproducer has proved to be unreliable and I think it's very sensitive > to scheduling timing. > > Reproducer is to simply run 273 in a loop on a single/single filesystem, > the null deref happens during umount. The tricky part is that it does not > happen every time, I must not touch the testbox nor let any process run > except 'dstat'. It usually crashed on first or second test run, but I've > left it up to 10 to be sure. > > I don't think it's caused by the skinny metadata, because it does not > touch the affected functions, but somehow helps to make it visible. > > Although the test never crashed when patch 7 > Btrfs: use helper to cleanup tree roots > http://git.kernel.org/cgit/linux/kernel/git/josef/btrfs-next.git/commit/?id=3fa215686c574d26f43f1bcf6c9f69658e02908f > was on top (I once left it running overnight, all fine), that's my main > suspect: > > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -3483,20 +3483,7 @@ int close_ctree(struct btrfs_root *root) > percpu_counter_sum(&fs_info->delalloc_bytes)); > } > > - free_extent_buffer(fs_info->extent_root->node); > - free_extent_buffer(fs_info->extent_root->commit_root); > - free_extent_buffer(fs_info->tree_root->node); > - free_extent_buffer(fs_info->tree_root->commit_root); > - free_extent_buffer(fs_info->chunk_root->node); > - free_extent_buffer(fs_info->chunk_root->commit_root); > - free_extent_buffer(fs_info->dev_root->node); > - free_extent_buffer(fs_info->dev_root->commit_root); > - free_extent_buffer(fs_info->csum_root->node); > - free_extent_buffer(fs_info->csum_root->commit_root); > - if (fs_info->quota_root) { > - free_extent_buffer(fs_info->quota_root->node); > - free_extent_buffer(fs_info->quota_root->commit_root); > - } > + free_root_pointers(fs_info, 1); > > and if you look what free_root_pointers does, sets all the pointers to NULL. > > The crash site: > > gdb) l *(btrfs_search_slot+0xb6) > 0x11076 is in btrfs_search_slot (/home/dsterba/linux-2.6/arch/x86/include/asm/atomic.h:95). > 90 * > 91 * Atomically increments @v by 1. > 92 */ > 93 static inline void atomic_inc(atomic_t *v) > 94 { > 95 asm volatile(LOCK_PREFIX "incl %0" > 96 : "+m" (v->counter)); > 97 } > > (gdb) l *(btrfs_search_slot+0xb5) > 0x11075 is in btrfs_search_slot (fs/btrfs/ctree.c:2513). > 2508 if (p->search_commit_root) { > 2509 /* > 2510 * the commit roots are read only > 2511 * so we always do read locks > 2512 */ > 2513 b = root->commit_root; > 2514 extent_buffer_get(b); > 2515 level = btrfs_header_level(b); > 2516 if (!p->skip_locking) > 2517 btrfs_tree_read_lock(b); > > so it's the extent_buffer_get() call, the offset of ->refs field is 120 = 0x78, > matches "NULL pointer dereference at 0000000000000078". So root->commit_root > is NULL. > > If we look what happens in disk-io.c::close_ctree: > > 3476 fs_info->closing = 2; > 3477 smp_mb(); > ... > 3486 free_root_pointers(fs_info, 1); > 3487 > 3488 btrfs_free_block_groups(fs_info); > 3489 > 3490 del_fs_roots(fs_info); > 3491 > 3492 iput(fs_info->btree_inode); > 3493 > 3494 btrfs_stop_workers(&fs_info->generic_worker); > ... > 3507 btrfs_stop_workers(&fs_info->caching_workers); > > Line 3477 assures that caching thread will exit the main loop when it sees > fs_closing > 1, but if it's past this check yet has to do some work, and > close_tree() calls free_root_pointers(), bang. As we can see the caching thread > goes down later. > > What I see here is a use-after-free that has been undetected so far and only > exposed by patch 7. > > This diff looks like the fix (on top of patch 8): > > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -3486,14 +3486,14 @@ int close_ctree(struct btrfs_root *root) > percpu_counter_sum(&fs_info->delalloc_bytes)); > } > > - free_root_pointers(fs_info, 1); > - > btrfs_free_block_groups(fs_info); > > del_fs_roots(fs_info); > > iput(fs_info->btree_inode); > > + free_root_pointers(fs_info, 1); > + > btrfs_stop_workers(&fs_info->generic_worker); > btrfs_stop_workers(&fs_info->fixup_workers); > btrfs_stop_workers(&fs_info->delalloc_workers); > --- > > > david Thanks for tracking it Dave, I'm trying to reproduce it here. thanks, liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
