Re: Can't repair raid 1 array after drive failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 1, 2020 at 11:02 AM Rollo ro <rollodroid@xxxxxxxxx> wrote:
>
> Hi again,
> I'm still running into problems with btrfs. For testing purposes, I
> created a raid 1 filesystem yesterday and let the computer copy a ton
> of data on it over night:
>
> Label: 'BTRFS1'  uuid: 61e5aba9-6811-46ae-9396-35a72d3b1117
>         Total devices 3 FS bytes used 1.15TiB
>         devid    1 size 5.46TiB used 1.16TiB path /dev/sdc1
>         devid    3 size 698.64GiB used 10.00GiB path /dev/sdf
>         devid    4 size 1.82TiB used 1.15TiB path /dev/sde
>
> Today I started scrub and looked at the status some hours later, which
> gave thousands of errors on drive 4:

What happened to devid 2?

>
> root@OMV:/var# btrfs scrub status /srv/dev-disk-by-label-BTRFS1/
> scrub status for 61e5aba9-6811-46ae-9396-35a72d3b1117
>         scrub started at Fri May  1 11:37:36 2020, running for 04:37:48
>         total bytes scrubbed: 1.58TiB with 75751000 errors
>         error details: read=75751000
>         corrected errors: 0, uncorrectable errors: 75750996,
> unverified errors: 0
>
> (Not shown here that it was drive 4, but it was)
>
> Then found that the drive is missing:
>
> Label: 'BTRFS1'  uuid: 61e5aba9-6811-46ae-9396-35a72d3b1117
>         Total devices 3 FS bytes used 1.15TiB
>         devid    1 size 5.46TiB used 1.16TiB path /dev/sdc1
>         devid    3 size 698.64GiB used 10.00GiB path /dev/sdf
>         *** Some devices missing
>
> Canceled scrub:
> root@OMV:/var# btrfs scrub cancel /srv/dev-disk-by-label-BTRFS1/
> scrub cancelled
>
> Stats showing lots of error on sde, which is the missing drive:
> root@OMV:/var# btrfs device stats /srv/dev-disk-by-label-BTRFS1/
> [/dev/sdc1].write_io_errs    0
> [/dev/sdc1].read_io_errs     0
> [/dev/sdc1].flush_io_errs    0
> [/dev/sdc1].corruption_errs  0
> [/dev/sdc1].generation_errs  0
> [/dev/sdf].write_io_errs    0
> [/dev/sdf].read_io_errs     0
> [/dev/sdf].flush_io_errs    0
> [/dev/sdf].corruption_errs  0
> [/dev/sdf].generation_errs  0
> [/dev/sde].write_io_errs    154997860
> [/dev/sde].read_io_errs     77170574
> [/dev/sde].flush_io_errs    310
> [/dev/sde].corruption_errs  0
> [/dev/sde].generation_errs  0
>
>
> I tried to replace
> root@OMV:/var# btrfs replace start 2 /dev/sdb /srv/dev-disk-by-label-BTRFS1/ &
> [1] 1809
> root@OMV:/var# ERROR: '2' is not a valid devid for filesystem
> '/srv/dev-disk-by-label-BTRFS1/'
>
> --> That's inconsistent with the device remove syntax, as it allows to
> use a non-existing number? I try again using the /dev/sdx syntax, but
> as sde is gone, I rescan and now it's sdi!

devid 2 was missing from the very start of the email, so it is not a
valid source for removal.

And devices vanishing and reappearing as other nodes suggests they're
on a flakey or transient bus. Are these SATA drives in USB enclosures?
And if so how are they connected?

A complete dmesg please (not trimmed, starting at boot) would be useful.

One device is missing, and another one vanished and reappeared, I
don't know whether Btrfs can really handle this case perfectly.

> Version info:
> btrfs-progs v4.20.1
> Kernel 5.4.0-0.bpo.4-amd64

It's probably not related to the problem, which seems to be hardware
related. But btrfs-progs v4.20.1 is ~16 months development behind v5.6
which is current. And thousands of changes in the kernel just for
Btrfs.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/?id=v5.6.8&id2=v5.4&dt=2


-- 
Chris Murphy



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux