On Wed, Sep 11, 2019 at 5:04 PM Chris Mason <clm@xxxxxx> wrote:
>
> On 11 Sep 2019, at 15:55, fdmanana@xxxxxxxxxx wrote:
>
> > From: Filipe Manana <fdmanana@xxxxxxxx>
> >
> > So fix this by not overwriting the return value (ret) with the result
> > from flush_write_bio(). We also need to clear the
> > EXTENT_BUFFER_WRITEBACK
> > bit in case flush_write_bio() returns an error, otherwise it will hang
> > any future attempts to writeback the extent buffer.
> >
> > This is a regression introduced in the 5.2 kernel.
> >
> > Fixes: 2e3c25136adfb ("btrfs: extent_io: add proper error handling to
> > lock_extent_buffer_for_io()")
> > Fixes: f4340622e0226 ("btrfs: extent_io: Move the BUG_ON() in
> > flush_write_bio() one level up")
> > Reported-by: Zdenek Sojka <zsojka@xxxxxxxxx>
> > Link:
> > https://lore.kernel.org/linux-btrfs/GpO.2yos.3WGDOLpx6t%7D.1TUDYM@xxxxxxxxx/T/#u
> > Reported-by: Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx>
> > Link:
> > https://lore.kernel.org/linux-btrfs/5c4688ac-10a7-fb07-70e8-c5d31a3fbb38@xxxxxxxxxxxx/T/#t
> > Reported-by: Drazen Kacar <drazen.kacar@xxxxxxxxxxx>
> > Link:
> > https://lore.kernel.org/linux-btrfs/DB8PR03MB562876ECE2319B3E579590F799C80@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
> > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204377
> > Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>
> > ---
> > fs/btrfs/extent_io.c | 23 ++++++++++++++---------
> > 1 file changed, 14 insertions(+), 9 deletions(-)
> >
> > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > index 1ff438fd5bc2..1311ba0fc031 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -3628,6 +3628,13 @@ void wait_on_extent_buffer_writeback(struct
> > extent_buffer *eb)
> > TASK_UNINTERRUPTIBLE);
> > }
> >
> > +static void end_extent_buffer_writeback(struct extent_buffer *eb)
> > +{
> > + clear_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags);
> > + smp_mb__after_atomic();
> > + wake_up_bit(&eb->bflags, EXTENT_BUFFER_WRITEBACK);
> > +}
> > +
> > /*
> > * Lock eb pages and flush the bio if we can't the locks
> > *
> > @@ -3699,8 +3706,11 @@ static noinline_for_stack int
> > lock_extent_buffer_for_io(struct extent_buffer *eb
> >
> > if (!trylock_page(p)) {
> > if (!flush) {
> > - ret = flush_write_bio(epd);
> > - if (ret < 0) {
> > + int err;
> > +
> > + err = flush_write_bio(epd);
> > + if (err < 0) {
> > + ret = err;
> > failed_page_nr = i;
> > goto err_unlock;
> > }
>
>
> Dennis (cc'd) has been trying a similar fix against this in production,
> but sending it was interrupted by plumbing conferences. I think he
> found that it needs to undo this as well:
>
> percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
> -eb->len,
> fs_info->dirty_metadata_batch);
>
> With the IO errors, we should end up aborting the FS. This function
> also flips the the extent buffer written and dirty flags, and his patch
> resets them as well. Given that we're aborting anyway, it's not
> critical, but it's probably a good idea to fix things up in the goto
> err_unlock just to make future bugs less likely.
Yes, I considered that at some point (undo everything done so far,
under the locks, etc) and thought it was pointless as well because we
abort.
But we may not abort - if we never start the writeback for an eb
because we returned error from flush_write_bio(), we can leave
btree_write_cache_pages() without noticing the error.
Since writeback never started, and btree_write_cache_pages() didn't
return the error, the transaction commit path may never get an error
from filemap_fdatawrite_range,
and we can commit the transaction despite failure to start writeback
for some extent buffer.
A problem that existed before that regression in 5.2 anyway. Sending
it as separate.
I'll include the undo of all operations in patch however, it doesn't
hurt for sure.
Thanks.
>
> -chris