2018-01-22 22:22 GMT+01:00 Hugo Mills <hugo@xxxxxxxxxxxxx>: > On Mon, Jan 22, 2018 at 10:06:58PM +0100, Claes Fransson wrote: >> Hi! >> >> I really like the features of BTRFS, especially deduplication, >> snapshotting and checksumming. However, when using it on my laptop the >> last couple of years, it has became corrupted a lot of times. >> Sometimes I have managed to fix the problems (at least so much that I >> can continue to use the filesystem) with check --repair, but several >> times I had to recreate the file system and reinstall the operating >> system. >> >> I am guessing the corruptions might be the results of unclean >> shutdowns, mostly after system hangs, but also because of running out >> of battery sometimes? >> Furthermore, the power-led has recently started blinking (also when >> the power-cable is plugged in), I guess because of an old and bad >> battery. Maybe the current corruption also can have something to do >> with this? However I almost always run with power cable plugged in in >> last year, only on battery a few seconds a few times when moving the >> laptop. >> >> Currently, I can only mount the filesystem readonly, it goes readonly >> automatically if I try to mount it normally. >> >> When booting an OpenSUSE Tumbleweed-20180119 live-iso: >> localhost:~ # uname -r >> 4.14.13-1-default >> localhost:~ # btrfs --version >> btrfs-progs v4.14.1 >> >> localhost:~ # btrfs check -p /dev/sda12 >> Checking filesystem on /dev/sda12 > > [fixing up bad paste] > >> UUID: d2819d5a-fd69-484b-bf34-f2b5692cbe1f >> bad key ordering 159 160 bad block 690436964352 >> ERROR: errors found in extent allocation tree or chunk allocation >> checking free space cache [.] >> checking fs roots [o] >> checking csums >> bad key ordering 159 160 >> Error looking up extent record -1 > > [snip] > >> localhost:~ # btrfs inspect-internal dump-tree -b 690436964352 >> /dev/sda12 >> btrfs-progs v4.14.1 >> leaf 690436964352 items 170 free space 1811 generation 196864 owner 2 >> leaf 690436964352 flags 0x1(WRITTEN) backref revision 1 >> fs uuid d2819d5a-fd69-484b-bf34-f2b5692cbe1f >> chunk uuid 52f81fe6-893b-4432-9336-895057ee81e1 >> . >> . >> . >> item 157 key (22732500992 EXTENT_ITEM 16384) itemoff 6538 itemsize 53 >> refs 1 gen 821 flags DATA >> extent data backref root 287 objectid 51665 offset 0 count 1 >> item 158 key (22732517376 EXTENT_ITEM 16384) itemoff 6485 itemsize 53 >> refs 1 gen 821 flags DATA >> extent data backref root 287 objectid 51666 offset 0 count 1 >> item 159 key (22732533760 EXTENT_ITEM 16384) itemoff 6485 itemsize 0 >> print-tree.c:428: print_extent_item: BUG_ON `item_size != sizeof(*ei0)` triggered, value 1 >> btrfs(+0x365c6)[0x55bdfaada5c6] >> btrfs(print_extent_item+0x424)[0x55bdfaadb284] >> btrfs(btrfs_print_leaf+0x94e)[0x55bdfaadbc1e] >> btrfs(btrfs_print_tree+0x295)[0x55bdfaadcf05] >> btrfs(cmd_inspect_dump_tree+0x734)[0x55bdfab1b024] >> btrfs(main+0x7d)[0x55bdfaac7d4d] >> /lib64/libc.so.6(__libc_start_main+0xea)[0x7ff42100ff4a] >> btrfs(_start+0x2a)[0x55bdfaac7e5a] >> Aborted (core dumped) > > Wow, I've never seen it do that before. It's the next thing I'd > have asked for, so it's good you've preempted it. > > The main thing is that bad key ordering is almost always due to RAM > corruption. That's either bad RAM, or dodgy power regulation -- the > latter could be the PSU, or capacitors on the motherboard. (In this > case, it might also be something funny with the battery). > > I would definitely recommend a long run of memtest86. At least 8 > hours, preferably 24. If you get errors repeatedly in the sme place, > it's the RAM. If they appear randomly, it's probably the power > regulation. > Thanks for the suggestion, I will try to do this in the next days. > [snip] > >> >> The filesystem had become pretty full, I had planned to increase the >> Btrfs-partition size before it became corrupt. >> >> Active kernel when the filesystem went read only: OpenSUSE Linux >> 4.14.14-1.geef6178-default, from the >> http://download.opensuse.org/repositories/Kernel:/stable/standard/stable >> repository. >> >> Fstab mount options: noatime,autodefrag (I have been using the option >> nossd with older kernels one period in the past on the filesystem). >> >> If it matters, I have been running duperemove many times on the >> filesystem since creation. >> >> To test the RAM, I have been running mprime Blend-test for 24 hours >> after the corruption without any error or warning. > > Of all of the bad key order errors I've seen (dozens), I think > there were a whole two which turned out not to be obviously related to > corrupt RAM. I still say that it's most likely the hardware. Okay, thank you for sharing your experience with me. > >> Is there a way I can try to repair this filesystem without the need to >> recreate it and reinstall the operating system? A reinstall including >> all currently installed packages, and restoring all current system >> settings, would probably take some time for me to do. >> If it is currently not repairable, it would be nice if this kind of >> corruption could be repaired in the future, even if losing a few >> files. Or if the corruptions could be avoided in the first place. > > Given that the current tools crash, the answer's a definite > no. However, if you can get a developer interested, they may be able > to write a fix for it, given an image of the FS (using btrfs-image). > Okay, will try to produce and upload an image within the next week. > [snip] >> I have never noticed any corruptions on the NTFS and Ext4 file systems >> on the laptop, only on the Btrfs file systems. > > You've never _noticed_ them. :) > > Hugo. > > -- > Hugo Mills | ... one ping(1) to rule them all, and in the > hugo@... carfax.org.uk | darkness bind(2) them. > http://carfax.org.uk/ | > PGP: E2AB1DE4 | Illiad Thank you for your answers. Claes -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
