Re: BTRFS corruption: open_ctree failed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Follow up: the issue was a faulty DIMM module. For some strange coincidence, only the space allocated to disk caches appeared to be corrupted - with the rest of the system working flawlessly most of the time.

I would guess that BTRFS tried to self-heal based on the cached data, ultimately corrupting the file system behind salvation?

If anyone gets here with similar problems - memtest your ram before doing anything!

-b11g


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, 3 January 2019 01:26, b11g <b11g@xxxxxxxxxxxxxx> wrote:

> Hi all,
>
> I have several BTRFS success-stories, and I've been an happy user for quite a long time now. I was therefore surprised to face a BTRFS corruption on a system I'd just installed.
>
> I use NixOS, unstable branch (linux kernel 4.19.12). The system runs on a SSD with an ext4 boot partition, a simple btrfs root with some subvolumes, and some swap space only used for hibernation. I was working on my server as normal when I noticed all of my BTRFS subvolumes had been remounted ro. After a short time, I started getting various IO errors ("bus error" by journalctl, "I/O error" by ls etc.). I halted the system (hard reboot), at the reboot the BTRFS partition would not mount. I suspected the corruption to be disk-related, but smartctl does not show any warning for the disk, and the ext4 partition seems healthy.
>
> Those are the kernel messages logged when I attempt to mount the partition:
> Jan 02 23:39:38 nixos kernel: BTRFS warning (device sdd2): sdd2 checksum verify failed on <L> wanted <A> found <B> level 0
> Jan 02 23:39:38 nixos kernel: BTRFS error (device sdd2): failed to read block groups: -5
> Jan 02 23:39:38 nixos systemd[1]: Started Cleanup of Temporary Directories.
> Jan 02 23:39:38 nixos kernel: BTRFS error (device sdd2): open_ctree failed
>
> Some queries for the error code I got lead me to those two recent threads:
> https://www.spinics.net/lists/linux-btrfs/msg84973.html
> https://www.spinics.net/lists/linux-btrfs/msg83833.html
>
> Using btrfs-progs-4.15.1, "btrfs restore /dev/sdd2 /tmp/" fails with:
> checksum verify failed on <N> found <A> wanted <B>
> checksum verify failed on <N> found <A> wanted <B>
> Csum didn't match
> Could not open root, trying backup super
> checksum verify failed on <N> found <A> wanted <B>
> checksum verify failed on <N> found <A> wanted <B>
> Csum didn't match
> Could not open root, trying backup super
> ERROR: superblock bytenr <X> is larger than device size <Y>
> Could not open root, trying backup super
>
> Using btrfs-progs-4.19.1, "btrfs restore /dev/sdd2 /tmp/" succeeds with some exceptions:
> We have looped trying to restore files in /@/nix/store too many times to be making progress, stopping
>
> I do not have much time for debugging the issue and I did not lose important data, so I tried a couple of commands suggested on the threads and in the docs (without fully understanding them):
>
> "btrfs rescue zero-log /dev/sdd2":
> checksum verify failed on <N> found <A> wanted <B>
> checksum verify failed on <N> found <A> wanted <B>
> Csum didn't match
> ERROR: could not open ctree
>
> "btrfs check --repair /dev/sdd2" (I know, I was not supposed to run this one):
> Opening filesystem to check...
> checksum verify failed on <N> found <A> wanted <B>
> checksum verify failed on <N> found <A> wanted <B>
> Csum didn't match
> ERROR: could not open ctree
>
> Same for "btrfs check --init-csum-tree /dev/sdd2".
>
> I expect to wipe the disk and do a clean start in the following days, I just wanted to report this in the hope it helps in the development (sorry for the redaction). If you need more information, I'll be glad to help as I can!
>
> Thank you for your work,
> Cheers,
>
> -   b11g






[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux