Re: [PATCH 3/3] Btrfs: make raid6 rebuild retry more

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 05, 2017 at 11:04:03AM -0700, Liu Bo wrote:
> On Tue, Dec 05, 2017 at 04:07:35PM +0800, Qu Wenruo wrote:
> > 
> > 
> > On 2017年12月05日 06:40, Liu Bo wrote:
> > > There is a scenario that can end up with rebuild process failing to
> > > return good content, i.e.
> > > suppose that all disks can be read without problems and if the content
> > > that was read out doesn't match its checksum, currently for raid6
> > > btrfs at most retries twice,
> > > 
> > > - the 1st retry is to rebuild with all other stripes, it'll eventually
> > >   be a raid5 xor rebuild,
> > > - if the 1st fails, the 2nd retry will deliberately fail parity p so
> > >   that it will do raid6 style rebuild,
> > > 
> > > however, the chances are that another non-parity stripe content also
> > > has something corrupted, so that the above retries are not able to
> > > return correct content, and users will think of this as data loss.
> > > More seriouly, if the loss happens on some important internal btree
> > > roots, it could refuse to mount.
> > > 
> > > This extends btrfs to do more retries and each retry fails only one
> > > stripe.  Since raid6 can tolerate 2 disk failures, if there is one
> > > more failure besides the failure on which we're recovering, this can
> > > always work.
> > 
> > This should be the correct behavior for RAID6, try all possible
> > combination until all combination is exhausted or correct data can be
> > recovered.
> > 
> > > 
> > > The worst case is to retry as many times as the number of raid6 disks,
> > > but given the fact that such a scenario is really rare in practice,
> > > it's still acceptable.
> > 
> > And even we tried that much times, I don't think it will be a big problem.
> > Since most of the that happens purely in memory, it should be so fast
> > that no obvious impact can be observed.
> >
> 
> It's basically a while loop, so it may cause some delay/hang, anyway,
> it's rare though.
> 
> > While with some small nitpick inlined below, the idea looks pretty good
> > to me.
> > 
> > Reviewed-by: Qu Wenruo <wqu@xxxxxxxx>
> > 
> > > 
> > > Signed-off-by: Liu Bo <bo.li.liu@xxxxxxxxxx>
> > > ---
> > >  fs/btrfs/raid56.c  | 18 ++++++++++++++----
> > >  fs/btrfs/volumes.c |  9 ++++++++-
> > >  2 files changed, 22 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> > > index 8d09535..064d5bc 100644
> > > --- a/fs/btrfs/raid56.c
> > > +++ b/fs/btrfs/raid56.c
> > > @@ -2166,11 +2166,21 @@ int raid56_parity_recover(struct btrfs_fs_info *fs_info, struct bio *bio,
> > >  	}
> > >  
> > >  	/*
> > > -	 * reconstruct from the q stripe if they are
> > > -	 * asking for mirror 3
> > > +	 * Loop retry:
> > > +	 * for 'mirror == 2', reconstruct from all other stripes.
> > 
> > What about using macro to makes the reassemble method more human readable?
> > 
> > And for mirror == 2 case, "rebuild from all" do you mean rebuild using
> > all remaining data stripe + P? The word "all" here is a little confusing.
> >
> 
> Thank you for the comments.
> 
> It depends, if all other stripes are good to read, then it'd do
> 'data+p' which is raid5 xor rebuild, if some disks also fail, then
> it'd may do 'data+p+q' or 'data+q'.
> 
> Is it better to say "for mirror == 2, reconstruct from other available
> stripes"?

Yes it is, you can also add the examples from the previous paragraph.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux