Re: [PATCH][RFC] btrfs: introduce rescue=onlyfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 2, 2020 at 7:38 PM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote:
>
>
>
> On 2020/7/2 下午11:28, Josef Bacik wrote:
> > On 7/1/20 11:09 PM, Qu Wenruo wrote:
> >>
> >>
> >> On 2020/7/2 上午3:53, Josef Bacik wrote:
> >>> On 7/1/20 3:43 PM, waxhead wrote:
> >>>>
> >>>>
> >>>> Josef Bacik wrote:
> >>>>> One of the things that came up consistently in talking with Fedora
> >>>>> about
> >>>>> switching to btrfs as default is that btrfs is particularly vulnerable
> >>>>> to metadata corruption.  If any of the core global roots are
> >>>>> corrupted,
> >>>>> the fs is unmountable and fsck can't usually do anything for you
> >>>>> without
> >>>>> some special options.
> >>>>>
> >>>>> Qu addressed this sort of with rescue=skipbg, but that's poorly
> >>>>> named as
> >>>>> what it really does is just allow you to operate without an extent
> >>>>> root.
> >>>>> However there are a lot of other roots, and I'd rather not have to do
> >>>>>
> >>>>> mount -o
> >>>>> rescue=skipbg,rescue=nocsum,rescue=nofreespacetree,rescue=blah
> >>>>>
> >>>>> Instead take his original idea and modify it so it just works for
> >>>>> everything.  Turn it into rescue=onlyfs, and then any major root we
> >>>>> fail
> >>>>> to read just gets left empty and we carry on.
> >>>>>
> >>>>> Obviously if the fs roots are screwed then the user is in trouble, but
> >>>>> otherwise this makes it much easier to pull stuff off the disk without
> >>>>> needing our special rescue tools.  I tested this with my TEST_DEV that
> >>>>> had a bunch of data on it by corrupting the csum tree and then reading
> >>>>> files off the disk.
> >>>>>
> >>>>> Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
> >>>>> ---
> >>>>
> >>>> Just an idea inspired from RAID1c3 and RAID1c3, how about introducing
> >>>> DUP2 and/or even DUP3 making multiple copies of the metadata to
> >>>> increase the chance to recover metadata on even a single storage
> >>>> device?
> >>>
> >>> Because this only works on HDD.  On SSD's concurrent writes will often
> >>> be shunted to the same erase block, and if the whole erase block goes,
> >>> so do all of your copies.  This is why we default to 'single' for SSD's.
> >>>
> >>> The one thing I _do_ want to do is make better use of the backup roots.
> >>> Right now we always free the pinned extents once the transaction
> >>> commits, which makes the backup roots useless as we're likely to re-use
> >>> those blocks.
> >>
> >> IIRC Filipe tried this before and didn't go that direction due to ENOSPC.
> >> As we need to commit multiple transactions to free the pinned extents.
> >>
> >> But maybe the latest async pinned extent drop could solve the problem?
> >>
> >
> > Yeah before it was tricky, but with Nikolay's work it made async pinned
> > extent drop possible, I've been testing that patch internally.
> >
> > Now it's just a matter of keeping the last 4 transactions worth of
> > pinned around and only unpinning under enospc conditions.  I'll dig out
> > the async unpinning and send that up next week since that's already
> > valuable by itself, and then we can talk about wiring up the ENOSPC part
> > of it.  Thanks,
>
> That's really awesome, let make btrfs the most bullet proof fs then!
>

Woohoo, yes! Go! 💪


-- 
真実はいつも一つ!/ Always, there's only one truth!




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux