Re: corrupt leaf, bad key order on kernel 5.0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 05, 2019 at 10:11:57PM +0300, Nazar Mokrynskyi wrote:
> NOTE: I do not need help with recovery, I have fully automated snapshots, backups and restoration mechanisms, the only purpose of this email is to help developers find the reason of yet another filesystem corruption and hopefully fix it.

   That's good news, at least.

> Yet another corruption of my root BTRFS filesystem happened today.
> Didn't bother to run scrub, balance or check, just created disk image for future investigation and restored everything from backup.
> 
> Here is what corruption looks like:
> [  274.241339] BTRFS info (device dm-0): disk space caching is enabled
> [  274.241344] BTRFS info (device dm-0): has skinny extents
> [  274.283238] BTRFS info (device dm-0): enabling ssd optimizations
> [  310.436672] BTRFS critical (device dm-0): corrupt leaf: root=268 block=42044719104 slot=123, bad key order, prev (1240717 108 41447424) current (1240717 76 41451520)

   "Bad key order" is usually an indicator of faulty RAM -- a piece of
metadata gets loaded into RAM for modification, a bit gets flipped in
it (because the bit is stuck on one value), and then the csum is
computed for the page (including the faulty bit), and written out to
disk. In this case, it's not obvious, but I'd suggest that the second
field of the key has been flipped, as 108 is 0x6c, and 76 is 0x4c --
one bit away from each other.

   I recommend you check your hardware thoroughly before attempting to
rebuild the FS.

   Hugo.

> [  310.449304] BTRFS critical (device dm-0): corrupt leaf: root=268 block=42044719104 slot=123, bad key order, prev (1240717 108 41447424) current (1240717 76 41451520)
> [  310.449309] BTRFS: error (device dm-0) in btrfs_dropa_snapshot:9250: errno=-5 IO failure
> [  310.449311] BTRFS info (device dm-0): forced readonly
> [  311.266789] BTRFS info (device dm-0): delayed_refs has NO entry
> [  311.277088] BTRFS error (device dm-0): cleaner transaction attach returned -30
> 
> My system just freezed when I was not looking at it and this is the state it is in now.
> File system survived from March 8th til April 05, one of the fastest corruptions in my experience.
> 
> Looks like this happened during sending incremental snapshot to the other BTRFS filesystem, since last snapshot on that one was not read-only as it should have been otherwise.
> 
> I'm on Ubuntu 19.04 with Linux kernel 5.0.5 and btrfs-progs v4.20.2.
> 
> My filesystem is on top of LUKS on NVMe SSD (SM961), I have 3 snapshots created every 15 minutes from 3 subvolumes with rotation of old snapshots (can be from tens to hundreds of snapshots at any time).
> 
> Mount options: compress=lzo,noatime,ssd
> 
> I have full disk image with corrupted filesystem and will create Qcow2 snapshots of it, so if you want me to run any experiments, including potentially destructive, including usage of custom patches to btrfs-progs to find out the reason of corruption, would be happy to help as much as I can.
> 
> P.S. I'm riding latest stable and rc kernels all the time and during last 6 months I've got about as many corruptions of different BTRFS filesystems as during 3 years before that, really worrying if you ask me.
> 

-- 
Hugo Mills             | I'm always right.
hugo@... carfax.org.uk | But I might be wrong about that.
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux