Re: Replacing a (or two?) failed drive(s) in RAID-1 btrfs filesystem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 9, 2015 at 5:54 PM, constantine <costas.magnuse@xxxxxxxxx> wrote:

> 1.  I am testing various files and all seem readable. Is there a way
> to list every file that resides on a particular device (like
> /dev/sdc1?) so as to check them? There are a handful of files that
> seem corrupted, since I get from scrub:
> """
> BTRFS: checksum error at logical 10792783298560 on dev /dev/sdc1,
> sector 737159648, root 5, inode 1376754, offset 175428419584, length
> 4096, links 1 (path: long/path/file.img)
> """,
> but are these the only files that could be corrupted?

It should be true the only corrupt files are the listed ones. I don't
have a good suggestion for the first question, whether btrfs restore
can help or btrfs-debug-tree - assuming you want something independent
from mounting the filesystem and just using recursive ls or tree
commands.




>
>
> 2. Chris mentioned:
>
> A. On Mon, Feb 9, 2015 at 12:31 AM, Chris Murphy
> <lists@xxxxxxxxxxxxxxxxx> wrote:
>> [[[try # btrfs device delete /dev/sdc1 /mnt/mountpoint]]]. Just realize that any data that's on both the
>> failed drive and sdc1 will be lost
>
> and later
>
> B. On Mon, Feb 9, 2015 at 1:34 AM, Chris Murphy <lists@xxxxxxxxxxxxxxxxx> wrote:
>> So now I have a 4 device
>> raid1 mounted degraded. And I can still device delete another device.
>> So one device missing and one device removed.
>
> So when I do the "# btrfs device delete /dev/sdc1 /mnt/mountpoint" the
> normal behavior would for the files that are located in /dev/sdc1 (and
> also were on the missing/failed drive) to be transferred to other
> drives and not lose them, right? (Does B. hold and contradict A.?)

The normal case, a non-degraded volume, a device delete will
successfully migrate data, and the volume remains non-degraded.

The unusual case, a degraded volume, a device delete is suspiciously
permitted. I think this is risky and maybe ought to be disallowed, or
at least require the user to use --force. And the reason is, it's a
degraded array. The first course of business is to do a 'device
replace start' or if enough devices exist 'btrfs device delete
missing' to get the volume from degraded to normal state. And then do
any additional device deletes.

But the even more unusual case, a degrade volume, with a 2nd device
that's producing a huge pile of read, write and corruption errors,
Btrfs can't migrate any data off the dead/removed drive (obviously),
but it also has problems removing the data that now only exists on the
2nd device that spitting out errors. I don't expect this device delete
to succeed.

The difference between case A and B, is that there isn't a 2nd drive
spitting out a pile of errors. It's merely degraded, with a drive
being deleted, and even that ended in a kernel panic for me, which
I've reproduced. However, as a followup, after rebooting, the btrfs
volume is mountable (degraded) without error. I can further btrfs
device delete missing, and remount normally (not degraded). So this is
good.

>After the whole process,
> I suppose I will have a more robust array structure the RED/RAID
> drives and appropriate cron jobs as indicated in the thread.

Good ending sounds like.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux