Ok so the last message wasn't so easy to read due to line wrap so here it is again with the log output replaced by ix.io links. To expand on my previous message, I have what was a 3-drive filesystem with RAID1 metadata and RAID5 data. One drive failed so I mounted degraded, added a replacement and tried to remove the missing (failed) drive. It won't remove - the remove aborts with an I/O error after checksum errors have been logged as reported in my last e-mail. I have run a btrfs check on the filesystem and this gives the following output: WARNING: filesystem mounted, continuing because of --force [1/7] checking root items [2/7] checking extents [3/7] checking free space cache [4/7] checking fs roots [5/7] checking only csums items (without verifying data) [6/7] checking root refs [7/7] checking quota groups skipped (not enabled on this FS) Opening filesystem to check... Checking filesystem on /dev/sda UUID: a3d38933-ee90-4b84-8f24-3a5c36dfd9be found 9834224820224 bytes used, no error found total csum bytes: 9588337304 total tree bytes: 13656375296 total fs tree bytes: 2760966144 total extent tree bytes: 388759552 btree space waste bytes: 1321640764 file data blocks allocated: 9820591190016 referenced 9820501786624 The filesystem was mounted r/o to avoid any changes upsetting the check. I have now started a scrub to see what that finds but the ETA is Sat Feb 29 07:57:49 2020 so I will report what that finds at the time. Regarding kernel messages I have found a few of these in the log starting before the disc failure: http://ix.io/2cLX but I think these may have nothing to do with it - they may be another filesystem (root) and the timeout may be because that is a USB stick which is rather slow. My reason for thinking that is that the process that gave rise to the timeout appears to be pacman, the Arch package manager which primarily writes to the root fileystem. It looks like the disc started to fail here: http://ix.io/2cM1 This goes on for pages and quite a few days, I can extract more if it is of interest. Next is a reboot - this is the shutdown part: http://ix.io/2cM2 then on the way back up: http://ix.io/2cM3 then after mounting degraded, add a new device and attempt to remove the missing one: http://ix.io/2cM4 and at that point the device remove aborted with an I/O error. I did discover I could use balance with a filter to balance much of the data onto the three working discs, away from the missing one but I also discovered that whenever the checksum error appears the space cache seems to get corrupted. Any further balance attempt results in getting stuck in a loop. Mounting with clear_cache resolves that. Regards. Steve.
