On Fri, May 1, 2020 at 11:02 AM Rollo ro <rollodroid@xxxxxxxxx> wrote: > > Hi again, > I'm still running into problems with btrfs. For testing purposes, I > created a raid 1 filesystem yesterday and let the computer copy a ton > of data on it over night: > > Label: 'BTRFS1' uuid: 61e5aba9-6811-46ae-9396-35a72d3b1117 > Total devices 3 FS bytes used 1.15TiB > devid 1 size 5.46TiB used 1.16TiB path /dev/sdc1 > devid 3 size 698.64GiB used 10.00GiB path /dev/sdf > devid 4 size 1.82TiB used 1.15TiB path /dev/sde > > Today I started scrub and looked at the status some hours later, which > gave thousands of errors on drive 4: What happened to devid 2? > > root@OMV:/var# btrfs scrub status /srv/dev-disk-by-label-BTRFS1/ > scrub status for 61e5aba9-6811-46ae-9396-35a72d3b1117 > scrub started at Fri May 1 11:37:36 2020, running for 04:37:48 > total bytes scrubbed: 1.58TiB with 75751000 errors > error details: read=75751000 > corrected errors: 0, uncorrectable errors: 75750996, > unverified errors: 0 > > (Not shown here that it was drive 4, but it was) > > Then found that the drive is missing: > > Label: 'BTRFS1' uuid: 61e5aba9-6811-46ae-9396-35a72d3b1117 > Total devices 3 FS bytes used 1.15TiB > devid 1 size 5.46TiB used 1.16TiB path /dev/sdc1 > devid 3 size 698.64GiB used 10.00GiB path /dev/sdf > *** Some devices missing > > Canceled scrub: > root@OMV:/var# btrfs scrub cancel /srv/dev-disk-by-label-BTRFS1/ > scrub cancelled > > Stats showing lots of error on sde, which is the missing drive: > root@OMV:/var# btrfs device stats /srv/dev-disk-by-label-BTRFS1/ > [/dev/sdc1].write_io_errs 0 > [/dev/sdc1].read_io_errs 0 > [/dev/sdc1].flush_io_errs 0 > [/dev/sdc1].corruption_errs 0 > [/dev/sdc1].generation_errs 0 > [/dev/sdf].write_io_errs 0 > [/dev/sdf].read_io_errs 0 > [/dev/sdf].flush_io_errs 0 > [/dev/sdf].corruption_errs 0 > [/dev/sdf].generation_errs 0 > [/dev/sde].write_io_errs 154997860 > [/dev/sde].read_io_errs 77170574 > [/dev/sde].flush_io_errs 310 > [/dev/sde].corruption_errs 0 > [/dev/sde].generation_errs 0 > > > I tried to replace > root@OMV:/var# btrfs replace start 2 /dev/sdb /srv/dev-disk-by-label-BTRFS1/ & > [1] 1809 > root@OMV:/var# ERROR: '2' is not a valid devid for filesystem > '/srv/dev-disk-by-label-BTRFS1/' > > --> That's inconsistent with the device remove syntax, as it allows to > use a non-existing number? I try again using the /dev/sdx syntax, but > as sde is gone, I rescan and now it's sdi! devid 2 was missing from the very start of the email, so it is not a valid source for removal. And devices vanishing and reappearing as other nodes suggests they're on a flakey or transient bus. Are these SATA drives in USB enclosures? And if so how are they connected? A complete dmesg please (not trimmed, starting at boot) would be useful. One device is missing, and another one vanished and reappeared, I don't know whether Btrfs can really handle this case perfectly. > Version info: > btrfs-progs v4.20.1 > Kernel 5.4.0-0.bpo.4-amd64 It's probably not related to the problem, which seems to be hardware related. But btrfs-progs v4.20.1 is ~16 months development behind v5.6 which is current. And thousands of changes in the kernel just for Btrfs. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/?id=v5.6.8&id2=v5.4&dt=2 -- Chris Murphy
