Hi all,
I have several BTRFS success-stories, and I've been an happy user for quite a long time now. I was therefore surprised to face a BTRFS corruption on a system I'd just installed.
I use NixOS, unstable branch (linux kernel 4.19.12). The system runs on a SSD with an ext4 boot partition, a simple btrfs root with some subvolumes, and some swap space only used for hibernation. I was working on my server as normal when I noticed all of my BTRFS subvolumes had been remounted ro. After a short time, I started getting various IO errors ("bus error" by journalctl, "I/O error" by ls etc.). I halted the system (hard reboot), at the reboot the BTRFS partition would not mount. I suspected the corruption to be disk-related, but smartctl does not show any warning for the disk, and the ext4 partition seems healthy.
Those are the kernel messages logged when I attempt to mount the partition:
Jan 02 23:39:38 nixos kernel: BTRFS warning (device sdd2): sdd2 checksum verify failed on <L> wanted <A> found <B> level 0
Jan 02 23:39:38 nixos kernel: BTRFS error (device sdd2): failed to read block groups: -5
Jan 02 23:39:38 nixos systemd[1]: Started Cleanup of Temporary Directories.
Jan 02 23:39:38 nixos kernel: BTRFS error (device sdd2): open_ctree failed
Some queries for the error code I got lead me to those two recent threads:
https://www.spinics.net/lists/linux-btrfs/msg84973.html
https://www.spinics.net/lists/linux-btrfs/msg83833.html
Using btrfs-progs-4.15.1, "btrfs restore /dev/sdd2 /tmp/" fails with:
checksum verify failed on <N> found <A> wanted <B>
checksum verify failed on <N> found <A> wanted <B>
Csum didn't match
Could not open root, trying backup super
checksum verify failed on <N> found <A> wanted <B>
checksum verify failed on <N> found <A> wanted <B>
Csum didn't match
Could not open root, trying backup super
ERROR: superblock bytenr <X> is larger than device size <Y>
Could not open root, trying backup super
Using btrfs-progs-4.19.1, "btrfs restore /dev/sdd2 /tmp/" succeeds with some exceptions:
We have looped trying to restore files in /@/nix/store too many times to be making progress, stopping
I do not have much time for debugging the issue and I did not lose important data, so I tried a couple of commands suggested on the threads and in the docs (without fully understanding them):
"btrfs rescue zero-log /dev/sdd2":
checksum verify failed on <N> found <A> wanted <B>
checksum verify failed on <N> found <A> wanted <B>
Csum didn't match
ERROR: could not open ctree
"btrfs check --repair /dev/sdd2" (I know, I was not supposed to run this one):
Opening filesystem to check...
checksum verify failed on <N> found <A> wanted <B>
checksum verify failed on <N> found <A> wanted <B>
Csum didn't match
ERROR: could not open ctree
Same for "btrfs check --init-csum-tree /dev/sdd2".
I expect to wipe the disk and do a clean start in the following days, I just wanted to report this in the hope it helps in the development (sorry for the redaction). If you need more information, I'll be glad to help as I can!
Thank you for your work,
Cheers,
- b11g