Re: [PATCH][RFC] btrfs: introduce rescue=onlyfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2020/7/2 下午11:28, Josef Bacik wrote:
> On 7/1/20 11:09 PM, Qu Wenruo wrote:
>>
>>
>> On 2020/7/2 上午3:53, Josef Bacik wrote:
>>> On 7/1/20 3:43 PM, waxhead wrote:
>>>>
>>>>
>>>> Josef Bacik wrote:
>>>>> One of the things that came up consistently in talking with Fedora
>>>>> about
>>>>> switching to btrfs as default is that btrfs is particularly vulnerable
>>>>> to metadata corruption.  If any of the core global roots are
>>>>> corrupted,
>>>>> the fs is unmountable and fsck can't usually do anything for you
>>>>> without
>>>>> some special options.
>>>>>
>>>>> Qu addressed this sort of with rescue=skipbg, but that's poorly
>>>>> named as
>>>>> what it really does is just allow you to operate without an extent
>>>>> root.
>>>>> However there are a lot of other roots, and I'd rather not have to do
>>>>>
>>>>> mount -o
>>>>> rescue=skipbg,rescue=nocsum,rescue=nofreespacetree,rescue=blah
>>>>>
>>>>> Instead take his original idea and modify it so it just works for
>>>>> everything.  Turn it into rescue=onlyfs, and then any major root we
>>>>> fail
>>>>> to read just gets left empty and we carry on.
>>>>>
>>>>> Obviously if the fs roots are screwed then the user is in trouble, but
>>>>> otherwise this makes it much easier to pull stuff off the disk without
>>>>> needing our special rescue tools.  I tested this with my TEST_DEV that
>>>>> had a bunch of data on it by corrupting the csum tree and then reading
>>>>> files off the disk.
>>>>>
>>>>> Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
>>>>> ---
>>>>
>>>> Just an idea inspired from RAID1c3 and RAID1c3, how about introducing
>>>> DUP2 and/or even DUP3 making multiple copies of the metadata to
>>>> increase the chance to recover metadata on even a single storage
>>>> device?
>>>
>>> Because this only works on HDD.  On SSD's concurrent writes will often
>>> be shunted to the same erase block, and if the whole erase block goes,
>>> so do all of your copies.  This is why we default to 'single' for SSD's.
>>>
>>> The one thing I _do_ want to do is make better use of the backup roots.
>>> Right now we always free the pinned extents once the transaction
>>> commits, which makes the backup roots useless as we're likely to re-use
>>> those blocks.
>>
>> IIRC Filipe tried this before and didn't go that direction due to ENOSPC.
>> As we need to commit multiple transactions to free the pinned extents.
>>
>> But maybe the latest async pinned extent drop could solve the problem?
>>
> 
> Yeah before it was tricky, but with Nikolay's work it made async pinned
> extent drop possible, I've been testing that patch internally.
> 
> Now it's just a matter of keeping the last 4 transactions worth of
> pinned around and only unpinning under enospc conditions.  I'll dig out
> the async unpinning and send that up next week since that's already
> valuable by itself, and then we can talk about wiring up the ENOSPC part
> of it.  Thanks,

That's really awesome, let make btrfs the most bullet proof fs then!

Thanks,
Qu

> 
> Josef
> 

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux