Re: Problem with unmountable filesystem.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2014-09-16 16:57, Chris Murphy wrote:
> 
> On Sep 16, 2014, at 8:40 AM, Austin S Hemmelgarn <ahferroin7@xxxxxxxxx> wrote:
> 
>> Based on the kernel messages, the primary issue is log corruption, and
>> in theory btrfs-zero-log should fix it.
> 
> Can you provide a complete dmesg somewhere for this initial failure, just for reference? I'm curious what this indication looks like compared to other problems.
> 
Okay, I can't really get a 'complete' dmesg, because the system panics 
on the mount failure (the filesystem in question is the system's root 
filesystem), the system has no serial ports, and I didn't think to 
build in support for console on ttyUSB0.  I can however get what the 
recovery environment (locally compiled based on buildroot) shows when I 
try to mount the filesystem:
[   30.871036] BTRFS: device label gentoo devid 1 transid 160615 /dev/sda3
[   30.875225] BTRFS info (device sda3): disk space caching is enabled
[   30.917091] BTRFS: detected SSD devices, enabling SSD mode
[   30.920536] BTRFS: bad tree block start 0 130402254848
[   30.924018] BTRFS: bad tree block start 0 130402254848
[   30.926234] BTRFS: failed to read log tree
[   30.953055] BTRFS: open_ctree failed
>>  The actual issue however, is
>> that the primary superblock appears to be pointing at a corrupted root
>> tree, which causes pretty much everything that does anything other than
>> just read the sb to fail.  The first backup sb does point to a good
>> tree, but only btrfs check and btrfs restore have any option to ignore
>> the first sb and use one of the backups instead.
> 
> Maybe use wipefs -a on this volume, which removes the magic from only the first superblock by default (you can specify another location). And then try btrfs-show-super -F which "dumps" supers with bad magic.
> 
Thanks for the suggestion, I hadn't thought of that...
> I just tried this:
> # wipefs -a /dev/sdb
> /dev/sdb: 8 bytes were erased at offset 0x00010040 (btrfs): 5f 42 48 52 66 53 5f 4d
> # btrfs-show-super -F /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum			0x5c1196d7 [DON'T MATCH]
> bytenr			65536
> flags			0x1
> magic			........ [DON'T MATCH]
> […]
> # btrfs-show-super -i1 /dev/sdb
> superblock: bytenr=67108864, device=/dev/sdb
> ---------------------------------------------------------
> csum			0xfc70be19 [match]
> bytenr			67108864
> flags			0x1
> magic			_BHRfS_M [match]
> 
> So the mirror is definitely there and valid.
> # btrfs rescue super-recover -yv /dev/sdb
> No valid Btrfs found on /dev/sdb
> Usage or syntax errors
> 
> Not expected at all, man page says "Recover bad superblocks from good copies." There's a good copy, it's not being found by btrfs rescue super-recover. Seems like a bug.
> 
> 
> # btrfs check /dev/sdb
> No valid Btrfs found on /dev/sdb
> Couldn't open file system
> 
> # btrfs check -s1 /dev/sdb
> using SB copy 1, bytenr 67108864
> Checking filesystem on /dev/sdb
> UUID: 9acf13de-5b98-4f28-9992-533e4a99d348
> [snip]
> OK it finds it, maybe a --repair will fix the bad first one?
> # btrfs check -s1 /dev/sdb
> using SB copy 1, bytenr 67108864
> enabling repair mode
> Checking filesystem on /dev/sdb
> UUID: 9acf13de-5b98-4f28-9992-533e4a99d348
> [snip]
> No indication of repair
> # btrfs check /dev/sdb
> No valid Btrfs found on /dev/sdb
> Couldn't open file system
> # btrfs check /dev/sdb
> No valid Btrfs found on /dev/sdb
> Couldn't open file system
> [root@f21v ~]# btrfs-show-super -F /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum			0x5c1196d7 [DON'T MATCH]
> bytenr			65536
> flags			0x1
> magic			........ [DON'T MATCH]
> 
> 
> Still not fixed. Maybe I needed to corrupt something else in the superblock other than the magic and this behavior is intentional, otherwise wipefs -a, followed by btrfsck would resurrect an intentionally wiped btrfs fs, potentially wiping out some newer file system in the process.
> 
...though maybe it's a good thing I didn't.
> 
> 
>> I'm fine using dd to replace the primary sb with one of the
>> backups, but don't know the exact parameters that would be needed.
> 
> Here's an idea:
> 
> # btrfs-show-super /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum			0x92aa51ab [match]
> [snip]
> So I know what I'm looking for starts at LBA 65536/512
> 
> # dd if=/dev/sdb skip=128 count=4 2>/dev/null | hexdump -C
> 00000000  92 aa 51 ab 00 00 00 00  00 00 00 00 00 00 00 00  |..Q…..........|
> [snip]
> 
> And as it turns out the csum is right at the beginning, 4 bytes. So use bs of 4 bytes, seek 65536/4, count of 1. This should zero just 4 bytes starting at 65536 bytes in.
> 
> # dd if=/dev/zero of=/dev/sdb bs=4 seek=16384 count=1
> 
> Checked it with the earlier skip=128 command and it looks like everything else is intact.
> 
> # btrfs-show-super -F /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum			0x00000000 [DON'T MATCH]
> bytenr			65536
> flags			0x1
> magic			_BHRfS_M [match]
> [snip]
> OK so the csum is bad, the magic is good. Now see if btrfs rescue super-recover does anything
> # btrfs rescue super-recover /dev/sdb
> Make sure this is a btrfs disk otherwise the tool will destroy other fs, Are you sure? [y/N]: Y
> Recovered bad superblocks successful
> *** Error in `btrfs': corrupted double-linked list: 0x0000000002289e40 ***
> ======= Backtrace: =========
> /lib64/libc.so.6(+0x7a77e)[0x7f388663977e]
> /lib64/libc.so.6(+0x80b03)[0x7f388663fb03]
> /lib64/libc.so.6(+0x81c88)[0x7f3886640c88]
> /lib64/libc.so.6(cfree+0x4c)[0x7f38866456ec]
> btrfs[0x425ec6]
> btrfs[0x406902]
> /lib64/libc.so.6(__libc_start_main+0xf0)[0x7f38865df0e0]
> btrfs[0x406a04]
> ======= Memory m
> [snip]
> 
> kaboom!
> 
> But was it really successful?
> # btrfs-show-super -F /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum			0x92aa51ab [match]
> [skip]
> Looks fixed. And it mounts.
> 
> NOW, I didn't actually have my first superblock pointing to a corrupt root tree. So it's possible that while the csum was fixed in my case, that the subsequent crash has not properly copied all good parts of superblock1 to superblock0. *shrug*
> 
> And since it crashes, looks like I found a bug.
> 
>> I'm using btrfs-progs 3.16 and
>> kernel 3.16.1.
> 
> So did I for all of the above.
> 
> 
Since posting this, I realized that the recovery environment I'm working from is actually btrfs-progs 3.14.1 and kernel 3.14.5, I need to make a point to update that once I get the system working again.

I've also discovered, when trying to use btrfs restore to copy out the data to a different system, that 3.14.1 restore apparently chokes on filesystem that have lzo compression turned on.  It's reporting errors trying to inflate compressed files, and I know for a fact that none of those files were even open, let alone being written to, when the system crashed.  I don't know if this is a known bug or even if it is still the case with btrfs-progs 3.16, but I figured I'd comment about it because I haven't seen anything about it anywhere.

Also, I interestingly didn't get the crash you saw above with btrfs rescue super-recover, so that might be a regression in 3.16 btrfs-progs.

Thanks for all the help.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux