Re: USB reset + raid6 = majority of files unreadable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ok so the last message wasn't so easy to read due to line wrap so here
it is again with the log output replaced by ix.io links.

To expand on my previous message, I have what was a 3-drive
filesystem with RAID1 metadata and RAID5 data.  One drive failed so I
mounted degraded, added a replacement and tried to remove the missing
(failed) drive.  It won't remove - the remove aborts with an I/O error
after checksum errors have been logged as reported in my last e-mail.

I have run a btrfs check on the filesystem and this gives the following output:

WARNING: filesystem mounted, continuing because of --force
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
Opening filesystem to check...
Checking filesystem on /dev/sda
UUID: a3d38933-ee90-4b84-8f24-3a5c36dfd9be
found 9834224820224 bytes used, no error found
total csum bytes: 9588337304
total tree bytes: 13656375296
total fs tree bytes: 2760966144
total extent tree bytes: 388759552
btree space waste bytes: 1321640764
file data blocks allocated: 9820591190016
 referenced 9820501786624

The filesystem was mounted r/o to avoid any changes upsetting the
check.  I have now started a scrub to see what that finds but the ETA
is Sat Feb 29 07:57:49 2020 so I will report what that finds at the
time.

Regarding kernel messages I have found a few of these in the log
starting before the disc failure:

http://ix.io/2cLX

but I think these may have nothing to do with it - they may be another
filesystem (root) and the timeout may be because that is a USB stick
which is rather slow.  My reason for thinking that is that the process
that gave rise to the timeout appears to be pacman, the Arch package
manager which primarily writes to the root fileystem.

It looks like the disc started to fail here:

http://ix.io/2cM1

This goes on for pages and quite a few days, I can extract more if it
is of interest.  Next is a reboot - this is the shutdown part:

http://ix.io/2cM2

then on the way back up:

http://ix.io/2cM3

then after mounting degraded, add a new device and attempt to remove
the missing one:

http://ix.io/2cM4

and at that point the device remove aborted with an I/O error.

I did discover I could use balance with a filter to balance much of
the data onto the three working discs, away from the missing one but I also
discovered that whenever the checksum error appears the space cache
seems to get corrupted.  Any further balance attempt results in
getting stuck in a loop.  Mounting with clear_cache resolves that.

Regards.
Steve.



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux