On 2020/7/2 下午11:28, Josef Bacik wrote: > On 7/1/20 11:09 PM, Qu Wenruo wrote: >> >> >> On 2020/7/2 上午3:53, Josef Bacik wrote: >>> On 7/1/20 3:43 PM, waxhead wrote: >>>> >>>> >>>> Josef Bacik wrote: >>>>> One of the things that came up consistently in talking with Fedora >>>>> about >>>>> switching to btrfs as default is that btrfs is particularly vulnerable >>>>> to metadata corruption. If any of the core global roots are >>>>> corrupted, >>>>> the fs is unmountable and fsck can't usually do anything for you >>>>> without >>>>> some special options. >>>>> >>>>> Qu addressed this sort of with rescue=skipbg, but that's poorly >>>>> named as >>>>> what it really does is just allow you to operate without an extent >>>>> root. >>>>> However there are a lot of other roots, and I'd rather not have to do >>>>> >>>>> mount -o >>>>> rescue=skipbg,rescue=nocsum,rescue=nofreespacetree,rescue=blah >>>>> >>>>> Instead take his original idea and modify it so it just works for >>>>> everything. Turn it into rescue=onlyfs, and then any major root we >>>>> fail >>>>> to read just gets left empty and we carry on. >>>>> >>>>> Obviously if the fs roots are screwed then the user is in trouble, but >>>>> otherwise this makes it much easier to pull stuff off the disk without >>>>> needing our special rescue tools. I tested this with my TEST_DEV that >>>>> had a bunch of data on it by corrupting the csum tree and then reading >>>>> files off the disk. >>>>> >>>>> Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx> >>>>> --- >>>> >>>> Just an idea inspired from RAID1c3 and RAID1c3, how about introducing >>>> DUP2 and/or even DUP3 making multiple copies of the metadata to >>>> increase the chance to recover metadata on even a single storage >>>> device? >>> >>> Because this only works on HDD. On SSD's concurrent writes will often >>> be shunted to the same erase block, and if the whole erase block goes, >>> so do all of your copies. This is why we default to 'single' for SSD's. >>> >>> The one thing I _do_ want to do is make better use of the backup roots. >>> Right now we always free the pinned extents once the transaction >>> commits, which makes the backup roots useless as we're likely to re-use >>> those blocks. >> >> IIRC Filipe tried this before and didn't go that direction due to ENOSPC. >> As we need to commit multiple transactions to free the pinned extents. >> >> But maybe the latest async pinned extent drop could solve the problem? >> > > Yeah before it was tricky, but with Nikolay's work it made async pinned > extent drop possible, I've been testing that patch internally. > > Now it's just a matter of keeping the last 4 transactions worth of > pinned around and only unpinning under enospc conditions. I'll dig out > the async unpinning and send that up next week since that's already > valuable by itself, and then we can talk about wiring up the ENOSPC part > of it. Thanks, That's really awesome, let make btrfs the most bullet proof fs then! Thanks, Qu > > Josef >
Attachment:
signature.asc
Description: OpenPGP digital signature
