On 2019/1/15 下午7:48, Qu Wenruo wrote: > > > On 2019/1/15 下午7:28, Leonard Lausen wrote: >> Hi everyone, >> >> I just found my btrfs filesystem to be remounted read-only with the >> following in my journalctl [1]: >> >> Jan 15 08:56:40 leonard-xps13 kernel: BTRFS critical (device dm-2): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0) > > Tree-checker catches the corrupted tree block, again and again. > >> Jan 15 08:56:40 leonard-xps13 kernel: BTRFS: error (device dm-2) in __btrfs_free_extent:6831: errno=-5 IO failure >> Jan 15 08:56:40 leonard-xps13 kernel: BTRFS info (device dm-2): forced readonly >> Jan 15 08:56:40 leonard-xps13 kernel: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2978: errno=-5 IO failure >> Jan 15 08:56:40 leonard-xps13 kernel: BTRFS info (device dm-2): delayed_refs has NO entry >> >> Following Qu Wenruo's comment from 4th Sep 2018, I have generated the >> following tree-dumps: >> >> sudo btrfs inspect dump-tree -t root /dev/mapper/vg1-root > /tmp/btrfsdumproot >> sudo btrfs inspect dump-tree -b 1350630375424 /dev/mapper/vg1-root > /tmp/btrfsdump1350630375424 >> >> The root dump is at https://termbin.com/lz0l and the block dump at >> https://termbin.com/oev5 . The number 1350630375424 does not occur in >> the root dump. The root dump has 16715 lines, the block dump only 645. > > Super nice move, it shows the corruption and the cause. > > item 66 key (1714119835648 METADATA_ITEM 0) itemoff 13325 itemsize 33 > item 67 key (10510212874240 METADATA_ITEM 0) itemoff 13283 itemsize 42 > item 68 key (1714119868416 METADATA_ITEM 0) itemoff 13250 itemsize 33 > > See the key objectid of key 67 is way larger than item 66/68. > > And furthermore, it indeed looks like a bit rot: > 0x18f19810000 (1714119835648) > 0x98f19814000 (10510212874240) > 0x18f19818000 (1714119868416) > > See one bit got flipped. > > I don't know it's corrupted in memory or on the SSD, although I tend to > believe it's caused by memory bit flip. > But anyway, it can be fixed by patching the corrupted leaf manually. > > I'm working on the fix. > Please make sure there is no write into the fs (just in case, since the > fs should be RO). Here it is: https://github.com/adam900710/btrfs-progs/tree/dirty_fix_for_leonard_lausen You need to git checkout the branch, and then compile. (No need to install) Then inside the directory, execute: # ./btrfs-corrupt-block -X <device> It will try to locate the corrupted leaf using the dump-tree result. If it doesn't find the corrupted leaf or the content isn't expected, it will just exit without writing anything. Thanks, Qu > > And prepare a LiveUSB on which you could compile btrfs-progs (needs some > dependency). > > It shouldn't take me too long time crafting the fix. > > Thanks, > Qu > > >> >> Would this imply that the corrupt tree block was not yet commited? What >> actions do you recommend to take next? >> >> My kernel version is 4.20.2. I am writing this email via ssh from the >> affected system on some working server. Besides the error message above >> and the fact that the filesystem is readonly, I have not yet found any >> issues on the affected system. Note that the error was occuring under >> high system load while compiling a bunch of software on a tmpfs (and the >> compilation was successful, but installation failed in the end due to >> trying to copy to the by then read-only btrfs root filessytem). >> >> Does this suggest a hardware issue? >> >> Thank you for your help and taking the time to read this. >> >> Best regards >> Leonard >> >> [1]: For unknown reason, the dmesg output does not reach back to the >> time of the error, but only contains log messages from after the >> filesystem was mounted ro. >> >
Attachment:
signature.asc
Description: OpenPGP digital signature
