On Tue, Mar 07, 2017 at 12:49:58PM -0800, Liu Bo wrote:
> On Mon, Mar 06, 2017 at 10:55:46AM +0800, Qu Wenruo wrote:
> > [BUG]
> > When btrfs_reloc_clone_csum() reports error, it can underflow metadata
> > and leads to kernel assertion on outstanding extents in
> > run_delalloc_nocow() and cow_file_range().
> >
> > BTRFS info (device vdb5): relocating block group 12582912 flags data
> > BTRFS info (device vdb5): found 1 extents
> > assertion failed: inode->outstanding_extents >= num_extents, file: fs/btrfs//extent-tree.c, line: 5858
> >
> > Currently, due to another bug blocking ordered extents, the bug is only
> > reproducible under certain block group layout and using error injection.
> >
> > a) Create one data block group with one 4K extent in it.
> > To avoid the bug that hangs btrfs due to ordered extent which never
> > finishes
> > b) Make btrfs_reloc_clone_csum() always fail
> > c) Relocate that block group
> >
> > [CAUSE]
> > run_delalloc_nocow() and cow_file_range() handles error from
> > btrfs_reloc_clone_csum() wrongly:
> >
> > (The ascii chart shows a more generic case of this bug other than the
> > bug mentioned above)
> >
> > |<------------------ delalloc range --------------------------->|
> > | OE 1 | OE 2 | ... | OE n |
> > |<----------- cleanup range --------------->|
> > |<----------- ----------->|
> > \/
> > btrfs_finish_ordered_io() range
> >
> > So error handler, which calls extent_clear_unlock_delalloc() with
> > EXTENT_DELALLOC and EXTENT_DO_ACCOUNT bits, and btrfs_finish_ordered_io()
> > will both cover OE n, and free its metadata, causing metadata under flow.
> >
> > [Fix]
> > The fix is to ensure after calling btrfs_add_ordered_extent(), we only
> > call error handler after increasing the iteration offset, so that
> > cleanup range won't cover any created ordered extent.
> >
> > |<------------------ delalloc range --------------------------->|
> > | OE 1 | OE 2 | ... | OE n |
> > |<----------- ----------->|<---------- cleanup range --------->|
> > \/
> > btrfs_finish_ordered_io() range
> >
> > Signed-off-by: Qu Wenruo <quwenruo@xxxxxxxxxxxxxx>
> > ---
> > changelog:
> > v6:
> > New, split from v5 patch, as this is a separate bug.
> > ---
> > fs/btrfs/inode.c | 51 +++++++++++++++++++++++++++++++++++++++------------
> > 1 file changed, 39 insertions(+), 12 deletions(-)
> >
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index b2bc07aad1ae..1d83d504f2e5 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -998,15 +998,24 @@ static noinline int cow_file_range(struct inode *inode,
> > BTRFS_DATA_RELOC_TREE_OBJECTID) {
> > ret = btrfs_reloc_clone_csums(inode, start,
> > cur_alloc_size);
> > + /*
> > + * Only drop cache here, and process as normal.
> > + *
> > + * We must not allow extent_clear_unlock_delalloc()
> > + * at out_unlock label to free meta of this ordered
> > + * extent, as its meta should be freed by
> > + * btrfs_finish_ordered_io().
> > + *
> > + * So we must continue until @start is increased to
> > + * skip current ordered extent.
> > + */
> > if (ret)
> > - goto out_drop_extent_cache;
> > + btrfs_drop_extent_cache(BTRFS_I(inode), start,
> > + start + ram_size - 1, 0);
> > }
> >
> > btrfs_dec_block_group_reservations(fs_info, ins.objectid);
> >
> > - if (disk_num_bytes < cur_alloc_size)
> > - break;
> > -
> > /* we're not doing compressed IO, don't unlock the first
> > * page (which the caller expects to stay locked), don't
> > * clear any dirty bits and don't set any writeback bits
> > @@ -1022,10 +1031,21 @@ static noinline int cow_file_range(struct inode *inode,
> > delalloc_end, locked_page,
> > EXTENT_LOCKED | EXTENT_DELALLOC,
> > op);
> > - disk_num_bytes -= cur_alloc_size;
> > + if (disk_num_bytes > cur_alloc_size)
> > + disk_num_bytes = 0;
> > + else
> > + disk_num_bytes -= cur_alloc_size;
>
> I don't get the logic here, why do we 'break' if disk_num_bytes > cur_alloc_size?
I assume that you've run fstests against this patch, if so, I actually
start worrying about that no fstests found this problem.
Thanks,
-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html