Re: [PATCH][RFC] btrfs: introduce rescue=onlyfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 01/07/20 17:39, David Sterba wrote:
On Wed, Jul 01, 2020 at 05:22:18PM +0200, Lukas Straub wrote:
On Wed,  1 Jul 2020 10:44:38 -0400
Josef Bacik <josef@xxxxxxxxxxxxxx> wrote:

One of the things that came up consistently in talking with Fedora about
switching to btrfs as default is that btrfs is particularly vulnerable
to metadata corruption.  If any of the core global roots are corrupted,
the fs is unmountable and fsck can't usually do anything for you without
some special options.

Qu addressed this sort of with rescue=skipbg, but that's poorly named as
what it really does is just allow you to operate without an extent root.
However there are a lot of other roots, and I'd rather not have to do

mount -o rescue=skipbg,rescue=nocsum,rescue=nofreespacetree,rescue=blah

Instead take his original idea and modify it so it just works for
everything.  Turn it into rescue=onlyfs, and then any major root we fail
to read just gets left empty and we carry on.

Obviously if the fs roots are screwed then the user is in trouble, but
otherwise this makes it much easier to pull stuff off the disk without
needing our special rescue tools.  I tested this with my TEST_DEV that
had a bunch of data on it by corrupting the csum tree and then reading
files off the disk.

Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
---

I'm not married to the rescue=onlyfs name, if we can think of something better
I'm good.

Maybe you could go a step further and automatically switch to rescue
mode if something is corrupt. This is easier for the user than having
to remember the mount flags.

We don't want to do the auto-switching in general as it's a non-standard
situation.  It's better to get user attention than to silently mount
with limited capabilities and then let the user find out that something
went wrong, eg. system services randomly failing to start or work.


Eh. I'm sure stopping boot and dropping to initramfs shell is a great way to get someone's attention.

Afaik in mdadm or hardware raid the main way to notify the administrator of issues is sending an email, or send the error through the server fleet management software.

-Alberto



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux