Re: Mount issue, mount /dev/sdc2: can't read superblock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2018/12/24 下午8:48, Tomáš Metelka wrote:
> Hi Qu,
> 
> just 1 curious question (maybe 2) about your statement "log_root is 0":
> 
> What does it mean when log_root is non-zero?

This means there are some dirty log, namely caused by fsync().

You could consider log tree as some kind of journal used in ext/xfs.

Btrfs doesn't rely on log tree to keep its metadata consistent, but uses
it as a faster way to implement fsync().

For fs with dirty log, btrfs itself should be consistent no matter if we
replay the log or not.


The only certain thing a non-zero log tree shows is, there is definitely
an unexpected powerloss happened.
(But not vice verse, it's completely possible to hit a unexpected
powerloss without a dirty log, either it's lucky that no fsync() called
during that trans, or notreelog mount option is used)

> Because I have similar
> problem (unmountable FS ... I don't know how much but I know there's
> corrupted 2 subsequent items in chunk tree node)

Then the problem is not related to log root.
But either chunk tree get corrupted or metadata cow get exploited.

> and when I have made
> "btrfs inspect-internal dump-super":
> 
> superblock: bytenr=65536, device=/dev/sda4
> ...
> generation        2488742
> root            232408301568
> sys_array_size        97
> chunk_root_generation    2487902
> root_level        1
> chunk_root        242098421760
> chunk_root_level    1
> log_root        232433811456

log_root is only recorded in the primary super block.
So it's fine that your backup super block doesn't contain log root.

It's the designed behavior.

> log_root_transid    0
> log_root_level        0
> 
> superblock: bytenr=67108864, device=/dev/sda4
> ...
> generation        2488742
> root            232408301568
> sys_array_size        97
> chunk_root_generation    2487902
> root_level        1
> chunk_root        242098421760
> chunk_root_level    1
> log_root        0
> log_root_transid    0
> log_root_level        0
> 
> Unfortunately when I try to do "btrfs rescue chunk-recover" I get
> (beside others):
> 
> "...
> 
> Unrecoverable Chunks:
>   Chunk: start = 0, len = 4194304, type = 2, num_stripes = 1
>       Stripes list:
>       [ 0] Stripe: devid = 1, offset = 0
>       No block group.
>       No device extent.
> 
> Total Chunks:        184
>   Recoverable:        183
>   Unrecoverable:    1
> 
> Orphan Block Groups:
> 
> Orphan Device Extents:
> 
> Chunk tree recovery failed
> "
> 
> And when I try "btrfs restore -m -S -v -i -D <dev>" I get only:
> Could not open root, trying backup super
> Could not open root, trying backup super
> ERROR: superblock bytenr 274877906944 is larger than device size
> 212000047104
> Could not open root, trying backup super
> 
> Is it possible to recover data (at least some of them)? And is it worth
> to upgrade to newest btrfs-progs?

btrfs check --readonly output please.

btrfs check --readonly is always the most reliable and detailed output
for any possible recovery.

Also kernel message for the mount failure could help.

btrfs ins dump-tree/super is only useful when we have some ideas to verify.

Thanks,
Qu

> 
> uname -a:
> Linux tisc5 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018
> x86_64 x86_64 x86_64 GNU/Linux
> 
> btrfs-progs v4.15.1
> 
> Thanks
> Metaliza
> 
> 
> On 24. 12. 18 13:02, Qu Wenruo wrote:
>>
>>
>> On 2018/12/24 下午7:31, Peter Chant wrote:
>>> On 12/24/18 12:58 AM, Chris Murphy wrote:
>>>> On Sat, Dec 22, 2018 at 10:22 AM Peter Chant <pete@xxxxxxxxxxxxxxx>
>>>> wrote:
>>>>
>>>>> btrfs rescue super -v /dev/sdb2
>>>> ...
>>>>> All supers are valid, no need to recover
>>>>>
>>>>>
>>>>> btrfs insp dump-s -f <dev>
>>>> ...
>>>>> generation              7937947
>>>> ...
>>>>>          backup 0:
>>>>>                  backup_tree_root:       1113909100544   gen:
>>>>> 7937935    level: 1
>>>> ...
>>>>>          backup 1:
>>>>>                  backup_tree_root:       1113907347456   gen:
>>>>> 7937936    level: 1
>>>> ...
>>>>>          backup 2:
>>>>>                  backup_tree_root:       1113911951360   gen:
>>>>> 7937937    level: 1
>>>> ...
>>>>>          backup 3:
>>>>>                  backup_tree_root:       1113907494912   gen:
>>>>> 7937934    level: 1
>>>> ...
>>>>
>>>>
>>>> The kernel wrote out three valid checksummed supers, with what seems
>>>> to be a rather significant sanity violation. The super generation and
>>>> tree root address do not match any of the backup tree roots. The
>>>> *current* tree root is supposed to be in one of the backups as well.
>>>>
>>>
>>> I wonder if this is a result of my trying to fix things?  E.g. btrfs
>>> rescue super-recover or my attempts using the tools (and kernel) in Mint
>>> 18.1 at one point?
>>
>> At least super-recover is not responsible for this.
>> While btrfs check --repair could indeed cause problems.
>>
>> So it may be the case.
>>
>>>
>>> I must admit, early on I had assumed that either this file system was a
>>> simple fix or was completely trashed, so I thought I'd have a quick go
>>> at fixing it, or wipe it and start again.  But then I seemed to get
>>> close with only the one error, but unmountable.
>>>
>>>
>>>> Qu, any idea how this is even theoretically possible? Bit flip right
>>>> before the super is computed and checksummed? Seems like some kind of
>>>> corruption before checksum is computed.
>>>>
>>>>
>>>>> I'm getting suspicious of the drive as when I was trying the various
>>>>> btrfs rescue * tools I saw a 'bad block', or similar, error displayed.
>>>>> I also have a separate basic install on ext4 on the same disk.  Though
>>>>> e2fsck shows no errors and mounts fine I cannot log into that install.
>>>>> Maybe a coincidence, but too many bad things thrown up make me
>>>>> suspicious.  Whatever is happening this seems to be really fighting
>>>>> me.
>>>>
>>>> I'm not sure how even a bad device accounts for the super generation
>>>> and backup mismatches. That's damn strange.
>>>
>>> I'm less suspicious of the drive now.  I've been using an ext4 partition
>>> on the same drive for a few days now, having reinstalled on that and
>>> everything _seems_ fine.  Mind you, apart from usb sticks, I've not
>>> experienced a ssd failure.  Perhaps my hdd failure experience is not
>>> relevent, i.e. they work until they start throwing errors and then
>>> rapidly fail?
>>
>> I don't really believe a drive can be so easily corrupted to certain
>> bits while all other bits are OK.
>>
>>>
>>>
>>>>
>>>> If you get bored with the back and forth and just want to give up,
>>>> that's fine. I suggest that if you have the time and space, to take a
>>>> btrfs-image in case Qu or some other developer wants to look at this
>>>> file system at some point. The btrfs-image is a read only process, can
>>>> be set to scrub filenames, and only contains metadata. Size of the
>>>> resulting file is around 1/2 of the size of metadata, when doing
>>>> 'btrfs filesystem usage' or 'btrfs filesystem df'. So you'll need that
>>>> much free space to direct the command to.
>>>>
>>>> btrfs-image -ss -c9 -t4 <devicetoimage> pathtofile
>>>
>>> Just done that:
>>> bash-4.3# btrfs-image -ss -c9 -t4 /dev/sdd2
>>> /mnt/backup/btrfs_issue_dec_2018/btrfs_root_image_error_20181224.img
>>> WARNING: cannot find a hash collision for '..', generating garbage, it
>>> won't match indexes
>>>
>>>
>>>
>>>>
>>>> It might fail, if so you can try adding -w and see if that helps.
>>>
>>>
>>> OK, try with -w:
>>>
>>> OK, many many complaints about hash collisions:
>>> ...
>>> ARNING: cannot find a hash collision for 'ifup', generating garbage, it
>>> won't match indexes
>>> WARNING: cannot find a hash collision for 'catv', generating garbage, it
>>> won't match indexes
>>> WARNING: cannot find a hash collision for 'FDPC', generating garbage, it
>>> won't match indexes
>>> WARNING: cannot find a hash collision for 'LIBS', generating garbage, it
>>> won't match indexes
>>> WARNING: cannot find a hash collision for 'INTC', generating garbage, it
>>> won't match indexes
>>> WARNING: cannot find a hash collision for 'SPI', generating garbage, it
>>> won't match indexes
>>> WARNING: cannot find a hash collision for 'PDCA', generating garbage, it
>>> won't match indexes
>>> WARNING: cannot find a hash collision for 'EBI', generating garbage, it
>>> won't match indexes
>>> WARNING: cannot find a hash collision for 'SMC', generating garbage, it
>>> won't match indexes
>>> WARNING: cannot find a hash collision for 'WIFI', generating garbage, it
>>> won't match indexes
>>> WARNING: cannot find a hash collision for 'LWIP', generating garbage, it
>>> won't match indexes
>>> WARNING: cannot find a hash collision for 'HID', generating garbage, it
>>> won't match indexes
>>> WARNING: cannot find a hash collision for 'yun', generating garbage, it
>>> won't match indexes
>>> WARNING: cannot find a hash collision for 'avr4', generating garbage, it
>>> won't match indexes
>>> WARNING: cannot find a hash collision for 'avr6', generating garbage, it
>>> won't match indexes
>>> WARNING: cannot find a hash collision for 'WiFi', generating garbage, it
>>> won't match indexes
>>> WARNING: cannot find a hash collision for 'TFT', generating garbage, it
>>> won't match indexes
>>> WARNING: cannot find a hash collision for 'Knob', generating garbage, it
>>> won't match indexes
>>> WARNING: cannot find a hash collision for 'FP.h', generating garbage, it
>>> won't match indexes
>>> WARNING: cannot find a hash collision for 'SD.h', generating garbage, it
>>> won't match indexes
>>> WARNING: cannot find a hash collision for 'Beep', generating garbage, it
>>> won't match indexes
>>> WARNING: cannot find a hash collision for 'FORK', generating garbage, it
>>> won't match indexes
>>> WARNING: cannot find a hash collision for 'CHM', generating garbage, it
>>> won't match indexes
>>> WARNING: cannot find a hash collision for 'HandS', generating garbage,
>>> it won't match indexes
>>> WARNING: cannot find a hash collision for 'dm-0', generating garbage, it
>>> won't match indexes
>>>
>>>
>>> Now seems to stopped producing output.  Can't see if it is doing
>>> something useful.  (note, started again, more such messages)
>>
>> I don't know about other developers, normally I don't like btrfs-image
>> -ss at all.
>>
>> Even plain btrfs-image isn't so helpful, especially considering its size.
>>
>> Anyway, from all the data you collected, I suspect it's a corruption in
>> tree blocks allocation, maybe a btrfs bug in older kernels, which buried
>> a dangerous seed into the fs, breaking the metadata CoW.
>>
>> And one day, an unexpected powerloss makes the seed grow and screw up
>> the fs.
>>
>> Just a personal recommendation, for btrfs especially used with older
>> kernels, after a powerloss, it's highly recommended to run btrfs check
>> --readonly before mounting it.
>>
>> Thanks,
>> Qu
>>
>>>
>>>
>>>>
>>>> There is no log listed in the super so zero-log isn't indicated, and
>>>> also tells me there were no fsync's still flushing at the time of the
>>>> crash. The loss should be at most a minute of data, not an
>>>> inconsistent file system that can't be mounted anymore. Pretty weird.
>>>>
>>>
>>> I think I ran zero-log to see if that helped.  Given that there was no
>>> important data and I'd assume I'd either easily fix it, or wipe it and
>>> start over I may have taken the 'monkey radomly pounding the buttons'
>>> approach, short of 'btrfs check --repair'.  I only posted here as I
>>> though I'd fixed it apart from the one error!  If it were a simple fix
>>> then it was worth asking.
>>>
>>>
>>>> What were your mount options? Defaults? Anything custom like discard,
>>>> commit=, notreelog? Any non-default mount options themselves would not
>>>> be the cause of the problem, but might suggest partial ideas for what
>>>> might have happened.
>>>>
>>> fstab states:
>>> autodefrag,ssd,discard,noatime,defaults,subvol=_r_sl14.
>>> 2,compress=lzo
>>>
>>> However, I used an initrd, so I'm not sure if that is correct?
>>>
>>> Ok, digging into init within my initrd, the line where the root partion
>>> is mounted:
>>>    mount -o ro -t $ROOTFS $ROOTDEV /mnt
>>>
>>> Where $ROOTFS is:
>>> btrfs -o subvol=_r_sl14.2
>>>
>>> and $ROOTDEV is:
>>> /dev/disk/by-uuid/6496aabd-d6aa-49e0-96ca-e49c316edd8e
>>>
>>>
>>>
>>> Pete
>>>
>>

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux