Responded in-line.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, 3 January 2019 05:52, Chris Murphy <lists@xxxxxxxxxxxxxxxxx> wrote:
> On Wed, Jan 2, 2019 at 5:26 PM b11g b11g@xxxxxxxxxxxxxx wrote:
>
> > Hi all,
> > I have several BTRFS success-stories, and I've been an happy user for quite a long time now. I was therefore surprised to face a BTRFS corruption on a system I'd just installed.
> > I use NixOS, unstable branch (linux kernel 4.19.12). The system runs on a SSD with an ext4 boot partition, a simple btrfs root with some subvolumes, and some swap space only used for hibernation. I was working on my server as normal when I noticed all of my BTRFS subvolumes had been remounted ro. After a short time, I started getting various IO errors ("bus error" by journalctl, "I/O error" by ls etc.). I halted the system (hard reboot), at the reboot the BTRFS partition would not mount. I suspected the corruption to be disk-related, but smartctl does not show any warning for the disk, and the ext4 partition seems healthy.
> > Those are the kernel messages logged when I attempt to mount the partition:
> > Jan 02 23:39:38 nixos kernel: BTRFS warning (device sdd2): sdd2 checksum verify failed on <L> wanted <A> found <B> level 0
> > Jan 02 23:39:38 nixos kernel: BTRFS error (device sdd2): failed to read block groups: -5
> > Jan 02 23:39:38 nixos systemd[1]: Started Cleanup of Temporary Directories.
> > Jan 02 23:39:38 nixos kernel: BTRFS error (device sdd2): open_ctree failed
>
> Do you have the entire kernel message from the previous boot when the
> problem started, including I/O errors? We kinda need to see what was
> going on leading up to the read only mount, and the bus and I/O
> errors. journalctl -b-1 -k should do it, or using journalctl
> --list-boots to find it. You can redirect to a file with > and then
> attach to the reply if it's small enough, or put it up somewhere like
> Dropbox or Google Drive if it's too big.
Sadly I cannot find the journal file relevant to the boot in which the system failed in /var/log - only older entries, with no I/O errors. If you have any idea on where to look for logs I can check.
>
> btrfs rescue super -v /dev/sdd2
All Devices:
Device: id = 1, name = /dev/sdd2
Before Recovering:
[All good supers]:
device name = /dev/sdd2
superblock bytenr = 65536
device name = /dev/sdd2
superblock bytenr = <big N>
[All bad supers]:
All supers are valid, no need to recover
> btrfs insp dump-s -f /dev/sdd2
superblock: bytenr=65536, device=/dev/sdd2
---------------------------------------------------------
csum_type 0 (crc32c)
csum_size 4
csum 0x<C> [match]
bytenr 65536
flags 0x1
( WRITTEN )
magic _BHRfS_M [match]
fsid <ID>
label main
generation 6337
root <~10^10>
sys_array_size 97
chunk_root_generation 5976
root_level 1
chunk_root <~10^7>
chunk_root_level 0
log_root <~10^9>
log_root_transid 0
log_root_level 0
total_bytes <X:~10^12>
bytes_used <~10^12>
sectorsize 4096
nodesize 16384
leafsize (deprecated) 16384
stripesize 4096
root_dir 6
num_devices 1
compat_flags 0x0
compat_ro_flags 0x0
incompat_flags 0x169
( MIXED_BACKREF |
COMPRESS_LZO |
BIG_METADATA |
EXTENDED_IREF |
SKINNY_METADATA )
cache_generation 6337
uuid_tree_generation 6337
dev_item.uuid <ID2>
dev_item.fsid <ID> [match]
dev_item.type 0
dev_item.total_bytes <X:~10^12>
dev_item.bytes_used <~10^12>
dev_item.io_align 4096
dev_item.io_width 4096
dev_item.sector_size 4096
dev_item.devid 1
dev_item.dev_group 0
dev_item.seek_speed 0
dev_item.bandwidth 0
dev_item.generation 0
sys_chunk_array[2048]:
item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM <Y>)
length <L> owner 2 stripe_len 65536 type SYSTEM
io_align 4096 io_width 4096 sector_size 4096
num_stripes 1 sub_stripes 0
stripe 0 devid 1 offset <Y>
dev_uuid <ID2>
backup_roots[4]:
backup 0:
<...>
>
> Those are reader only. And also try to mount with -o usebackuproot and
> if that fails -o ro,usebackuproot is often more tolerant. But that's
> for getting data off the volume, it's more useful to know why the file
> system broke. And also why btrfs check is failing, given that it's a
> current version.
I got the data back using btrfs restore, mount -o ro,usebackuproot fails with the same errors (open_ctree failed).
>
> If you get a chance you can take an image, maybe a Btrfs developer
> will find it useful to understand why the Btrfs check is failing.
>
> <dev> /path/to/fileoutput.image
>
> That is usually around 1/2 the size of file system metadata. It
> contains no data and filenames will be hashed.
>
>
> ------------------------------------------------------------------------------------------------------------------
>
> Chris Murphy
I tried to take an image but even that fails:
"btrfs-image -c9 -t4 -ss /dev/sdd2 /mnt/metadata.image"
checksum verify failed on <N> found <A> wanted <B>
checksum verify failed on <N> found <A> wanted <B>
Csum didn't match
ERROR: open ctree failed
ERROR: create failed: Success
-b11g