On Thu, Nov 2, 2017 at 7:04 AM, Qu Wenruo <wqu@xxxxxxxx> wrote:
> [BUG]
> If we run btrfs with CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y, it will
> instantly cause kernel panic like:
>
> ------
> ...
> assertion failed: 0, file: fs/btrfs/disk-io.c, line: 3853
> ...
> Call Trace:
> btrfs_mark_buffer_dirty+0x187/0x1f0 [btrfs]
> setup_items_for_insert+0x385/0x650 [btrfs]
> __btrfs_drop_extents+0x129a/0x1870 [btrfs]
> ...
> ------
>
> [Cause]
> Btrfs will call btrfs_check_leaf() in btrfs_mark_buffer_dirty() to check
> if the leaf is valid with CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y.
>
> However some btrfs_mark_buffer_dirty() caller, like
> setup_items_for_insert(), doesn't really initialize its item data but
> only initialize its item pointers, leaving item data uninitialized.
So instead of doing this juggling, the best would be to have it not call
mark_buffer_dirty(), and leave that responsibility for the caller after
it initializes the item data. I give you a very good reason for that below.
>
> This makes tree-checker catch uninitialized data as error, causing
> such panic.
>
> [Fix]
> The correct timing to check leaf validation should be before write IO or
> after read IO.
>
> Just like ee have already done the tree validation check at btree
> readpage end io hook, this patch will move the write time tree checker to
> csum_dirty_buffer().
>
> As csum_dirty_buffer() is called just before submitting btree write bio, as
> the call path shows:
>
> btree_submit_bio_hook()
> |- __btree_submit_bio_start()
> |- btree_csum_one_bio()
> |- csum_dirty_buffer()
> |- btrfs_check_leaf()
>
> By this we can ensure the leaf passed in is in consistent status, and
> can check them without causing tons of false alert.
>
> Reported-by: Lakshmipathi.G <lakshmipathi.g@xxxxxxxxx>
> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
> ---
> fs/btrfs/disk-io.c | 26 +++++++++++++++++++-------
> 1 file changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index efce9a2fa9be..6c17bce2a05e 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -506,6 +506,7 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page)
> u64 start = page_offset(page);
> u64 found_start;
> struct extent_buffer *eb;
> + int ret;
>
> eb = (struct extent_buffer *)page->private;
> if (page != eb->pages[0])
> @@ -524,7 +525,24 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page)
> ASSERT(memcmp_extent_buffer(eb, fs_info->fsid,
> btrfs_header_fsid(), BTRFS_FSID_SIZE) == 0);
>
> - return csum_tree_block(fs_info, eb, 0);
> + ret = csum_tree_block(fs_info, eb, 0);
> + if (ret)
> + return ret;
> +
> +#ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
> + /*
> + * Do extra check before we write the tree block into disk.
> + */
> + if (btrfs_header_level(eb) == 0) {
> + ret = btrfs_check_leaf(fs_info->tree_root, eb);
> + if (ret) {
> + btrfs_print_leaf(eb);
> + ASSERT(0);
> + return ret;
> + }
> + }
> +#endif
> + return 0;
> }
>
> static int check_tree_block_fsid(struct btrfs_fs_info *fs_info,
> @@ -3847,12 +3865,6 @@ void btrfs_mark_buffer_dirty(struct extent_buffer *buf)
> percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
> buf->len,
> fs_info->dirty_metadata_batch);
> -#ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
> - if (btrfs_header_level(buf) == 0 && btrfs_check_leaf(root, buf)) {
> - btrfs_print_leaf(buf);
> - ASSERT(0);
> - }
> -#endif
So there's a reason why btrfs_check_leaf() was called here, at
mark_buffer_dirty(),
instead of somewhere else like csum_dirty_buffer().
The reason is that once some bad code inserts a key out of order for
example (or did any
other bad stuff that check_leaf() catched before you added the
tree-checker thing), we
would get a trace that pinpoints exactly where the bad code is. With
this change, we will
only know some is bad when writeback of the leaf starts, and before
that happens, the leaf might
have been changed dozens of times by many different functions (and
this happens very
often, it's far from being a unusual case), in which case the given
trace won't tell you which code
misbehaved. This makes it harder to find out bugs, and as it used to
be it certainly helped me in
the past several times. IOW, I would prefer what I mentioned earlier
or, at very least, do those new
checks that validate data only at writeback start time.
> }
>
> static void __btrfs_btree_balance_dirty(struct btrfs_fs_info *fs_info,
> --
> 2.14.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html