On Thu, Jul 2, 2020 at 7:38 PM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote: > > > > On 2020/7/2 下午11:28, Josef Bacik wrote: > > On 7/1/20 11:09 PM, Qu Wenruo wrote: > >> > >> > >> On 2020/7/2 上午3:53, Josef Bacik wrote: > >>> On 7/1/20 3:43 PM, waxhead wrote: > >>>> > >>>> > >>>> Josef Bacik wrote: > >>>>> One of the things that came up consistently in talking with Fedora > >>>>> about > >>>>> switching to btrfs as default is that btrfs is particularly vulnerable > >>>>> to metadata corruption. If any of the core global roots are > >>>>> corrupted, > >>>>> the fs is unmountable and fsck can't usually do anything for you > >>>>> without > >>>>> some special options. > >>>>> > >>>>> Qu addressed this sort of with rescue=skipbg, but that's poorly > >>>>> named as > >>>>> what it really does is just allow you to operate without an extent > >>>>> root. > >>>>> However there are a lot of other roots, and I'd rather not have to do > >>>>> > >>>>> mount -o > >>>>> rescue=skipbg,rescue=nocsum,rescue=nofreespacetree,rescue=blah > >>>>> > >>>>> Instead take his original idea and modify it so it just works for > >>>>> everything. Turn it into rescue=onlyfs, and then any major root we > >>>>> fail > >>>>> to read just gets left empty and we carry on. > >>>>> > >>>>> Obviously if the fs roots are screwed then the user is in trouble, but > >>>>> otherwise this makes it much easier to pull stuff off the disk without > >>>>> needing our special rescue tools. I tested this with my TEST_DEV that > >>>>> had a bunch of data on it by corrupting the csum tree and then reading > >>>>> files off the disk. > >>>>> > >>>>> Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx> > >>>>> --- > >>>> > >>>> Just an idea inspired from RAID1c3 and RAID1c3, how about introducing > >>>> DUP2 and/or even DUP3 making multiple copies of the metadata to > >>>> increase the chance to recover metadata on even a single storage > >>>> device? > >>> > >>> Because this only works on HDD. On SSD's concurrent writes will often > >>> be shunted to the same erase block, and if the whole erase block goes, > >>> so do all of your copies. This is why we default to 'single' for SSD's. > >>> > >>> The one thing I _do_ want to do is make better use of the backup roots. > >>> Right now we always free the pinned extents once the transaction > >>> commits, which makes the backup roots useless as we're likely to re-use > >>> those blocks. > >> > >> IIRC Filipe tried this before and didn't go that direction due to ENOSPC. > >> As we need to commit multiple transactions to free the pinned extents. > >> > >> But maybe the latest async pinned extent drop could solve the problem? > >> > > > > Yeah before it was tricky, but with Nikolay's work it made async pinned > > extent drop possible, I've been testing that patch internally. > > > > Now it's just a matter of keeping the last 4 transactions worth of > > pinned around and only unpinning under enospc conditions. I'll dig out > > the async unpinning and send that up next week since that's already > > valuable by itself, and then we can talk about wiring up the ENOSPC part > > of it. Thanks, > > That's really awesome, let make btrfs the most bullet proof fs then! > Woohoo, yes! Go! 💪 -- 真実はいつも一つ!/ Always, there's only one truth!
