Re: Btrfs check reports errors, filesystem seems fine

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Currently possible solution may be deleting the whole subvolume.
Can btrfs send (to external drive) and then btrfs receive back fix it? Or should I use simple cp/rsync?
 
> If you have full backup, then you could try it.
It is my root subvolume (sensitive data is on other ones), thus it is expendable. Can btrfs check --repair damage other subvolumes?

> Any idea about the reproducer? Or just random memory corruption?
No idea why and no idea when. This partition is about year and a half old, and I did btrfs check for the first time just about a month ago.
Also I ran memtest recently and it didn't find any errors.

In a letter from Friday, July 14, 2017 14:28:58 MSK user Qu Wenruo wrote:
> 
> On 2017年07月14日 18:12, Filippe LeMarchand wrote:
> > First "rm" on deprecated.txt worked, but file is still there. Neither the file, nor its parent directory cannot be deleted:
> > 
> > $ sudo rm /usr/share/doc/packages/util-linux/deprecated.txt
> > rm: cannot remove '/usr/share/doc/packages/util-linux/deprecated.txt': No such file or directory
> > 
> > $ sudo rm -rf /usr/share/doc/packages/util-linux/
> > rm: cannot remove '/usr/share/doc/packages/util-linux/': Directory not empty
> > 
> > $ sudo ls -l /usr/share/doc/packages/util-linux/
> > ls: cannot access '/usr/share/doc/packages/util-linux/deprecated.txt': No such file or directory
> > total 0
> > -????????? ? ? ? ?            ? deprecated.txt
> 
> Similar behavior is also detected using manually crafted image in our 
> environment.
> 
> Su Yue have sent patches to enhance error detection and test case for 
> it, but repairing is not supported.
> 
> > 
> > Reinstall of util-linux package gives me two of that file (and also two files present on previous snapshot):
> > 
> > $ ls -l /usr/share/doc/packages/util-linux/
> > total 104
> > -rw-r--r-- 1 root root 18092 Jul 20  2016 COPYING
> > -rw-r--r-- 1 root root  1391 Jul 20  2016 COPYING.BSD-3
> > -rw-r--r-- 1 root root 26530 Jul 20  2016 COPYING.LGPLv2.1
> > -rw-r--r-- 1 root root  1824 Jul 20  2016 COPYING.UCB
> > -rw-r--r-- 1 root root   555 Jul 20  2016 README.licensing
> > -rw-r--r-- 1 root root  3257 Jul 20  2016 blkid.txt
> > -rw-r--r-- 1 root root  2264 Jul 20  2016 cal.txt
> > -rw-r--r-- 1 root root  1913 Jul 20  2016 col.txt
> > -rw-r--r-- 1 root root  2825 May  2 13:17 deprecated.txt
> > -rw-r--r-- 1 root root  2825 May  2 13:17 deprecated.txt
> > -rw-r--r-- 1 root root   992 Jul 20  2016 getopt.txt
> > -rw-r--r-- 1 root root  2437 Nov  2  2016 howto-debug.txt
> > -rw-r--r-- 1 root root   148 Jul 20  2016 hwclock.txt
> > -rw-r--r-- 1 root root  2617 Jul 20  2016 modems-with-agetty.txt
> > -rw-r--r-- 1 root root   522 Jul 20  2016 mount.txt
> > -rw-r--r-- 1 root root   448 Jul 20  2016 pg.txt
> > 
> > So, is this situation actually dangerous? And what can I do to gather more information for you?
> 
> The situation won't be worse. I'd recommend not to take any snapshot of 
> those subvolumes (4546 and 5134) to limit the corruption to those 
> subvolumes.
> 
> However there is also no easy way to fix it yet.
> 
> Currently possible solution may be deleting the whole subvolume.
> If no further error happens, it may be fixed.
> 
> IIRC btrfs check --repair in original mode has 
> DIR_ITEM/DIR_INDEX/INODE_REF repair function, but I'm not sure if it can 
> handle it well.
> Btrfs check --repair *MAY* fix it, or it may make things worse.
> If you have full backup, then you could try it.
> Otherwise, don't try it at all.
> 
> Other solution includes a specific repair program just for your case.
> We can modify btrfs-corrupt-block to just delete the corrupted DIR_ITEM 
> (".sxt" one) and related DIR_INDEX/INODE_REF.
> But I'll only choose this if you really need to fix it as soon as possible.
> 
> At least we have solution for it.
> I'm more concerned about how this happened.
> 
> Any idea about the reproducer? Or just random memory corruption?
> 
> Thanks,
> Qu
> > 
> > In a letter from Friday, July 14, 2017 9:11:06 MSK user Qu Wenruo wrote:
> >> Thanks for your dump.
> >>
> >> We're clear what is the direct cause of the problem.
> >>
> >> It's one corrupted DIR_ITEM causing the problem.
> >> And further more, original mode btrfs check can't detect it, and we will
> >> fix it soon.
> >>
> >> The corrupted DIR_ITEM is as the following:
> >> 	item 72 key (79177 DIR_ITEM 54846528) itemoff 12380 itemsize 88
> >> 		location key (4222342 INODE_ITEM 0) type FILE
> >> 		transid 170929 data_len 0 name_len 14
> >> 		name: deprecated.sxt
> >> 		location key (13590433 INODE_ITEM 0) type FILE
> >> 		transid 796448 data_len 0 name_len 14
> >> 		name: deprecated.txt
> >>
> >> For dir inode 79177, it has 2 child inodes, with name "deprecated.txt"
> >> (ino=4222342) and "deprecated.sxt" (ino=13590433)
> >>
> >> But something goes wrong here:
> >>
> >> 1) Hash of "deprecated.sxt" doesn't match 54846528
> >>
> >> 2) Inode backref of inode 4222342 thinks its filename is "deprecated.txt"
> >> Also captured by dump:
> >> 	item 40 key (4222342 INODE_REF 79177) itemoff 7189 itemsize 24
> >> 		inode ref index 417 namelen 14 name: deprecated.txt
> >>
> >> 3) DIR_INDEX also shows that filename for inode 4222342 should be
> >> "deprecated.txt"
> >> 	item 87 key (79177 DIR_INDEX 417) itemoff 11757 itemsize 44
> >> 		location key (4222342 INODE_ITEM 0) type FILE
> >> 		transid 170929 data_len 0 name_len 14
> >> 		name: deprecated.txt
> >>
> >> So generic speaking, it's DIR_ITEM wrong and causing the problem.
> >>
> >> But the root reason is still unknown.
> >>
> >> What I can see is, the corrupted DIR_ITEM points to an very old inode,
> >> its mtime is back to 2016-09-07.
> >> While the good DIR_ITEM points to newer inode, whose mtime is just
> >> 2017-05-02.
> >>
> >> But more weird, there should not be two child inodes with the same
> >> filename ("depercated.txt", I assume the sxt one is caused by a memory
> >> bit corruption).
> >>
> >> So, any details on the operation with util-linux/deprecated.txt will
> >> help us to locate the root cause in kernel.
> >>
> >> Thanks,
> >> Qu
> >>
> >>
> >> On 2017年07月12日 21:11, Filippe LeMarchand wrote:
> >>> Done, files added to same GDrive folder with corresponding names.
> >>> If it matters, subvol 4546 is my root filesystem (r/w snapshot created with snapper rollback), and 5134 is its snapshot.
> >>>
> >>> In a letter dated Wednesday, July 12, 2017 15:44:52 MSK user Qu Wenruo wrote:
> >>>>
> >>>> On 2017年07月12日 19:12, Filippe LeMarchand wrote:
> >>>>>> Maybe something wrong in grep happened which skip "(79177" ?
> >>>>> Yes, my bad. Now I used grep -E "\(79177| 79177" pattern, file on GDrive updated.
> >>>>
> >>>> It looks much better, thanks.
> >>>>
> >>>>>
> >>>>> And btrfs check --mode=lowmem gives this:
> >>>>>
> >>>>> checking extents
> >>>>> ERROR: extent[1609877700608, 94208] referencer count mismatch (root: 260, owner: 61720, offset: 6742016) wanted: 2, have: 5
> >>>>> ERROR: extent[1630301675520, 39583744] referencer count mismatch (root: 260, owner: 5847554, offset: 0) wanted: 36, have: 114
> >>>>> ERROR: extent[1658646986752, 10551296] referencer count mismatch (root: 274, owner: 283675, offset: 0) wanted: 2, have: 5
> >>>>> ERROR: extent[1672239132672, 84381696] referencer count mismatch (root: 274, owner: 2521382, offset: 0) wanted: 21, have: 25
> >>>>> ERROR: errors found in extent allocation tree or chunk allocation
> >>>>
> >>>> Looks much like an exposed lowmem mode bug.
> >>>> Feel free to ignore these error from extent tree, they are just false
> >>>> alerts.
> >>>>
> >>>>> checking free space cache
> >>>>> checking fs roots
> >>>>> ERROR: root 4546 DIR_ITEM[79177 54846528] relative INODE_REF missing namelen 14 filename deprecated.sxt filetype 1
> >>>>
> >>>> The error report is much better than original mode, and that's what I need.
> >>>>
> >>>> Now I can wipe out all other noise as we know exactly which tree and
> >>>> which DIR_ITEM/INODE_REF is causing the problem.
> >>>>
> >>>> Would you please update the dump result with "-t 4546" passed to
> >>>> btrfs-debug-tree like:
> >>>>
> >>>> # btrfs-debug-tree -t 4546 <device>| grep 79177
> >>>>
> >>>> Only "-t 4546" is added, to only dump the result of subvolume 4546.
> >>>> As always, all 3 grep results (2 "deprecated" and one 79177) need to be
> >>>> updated.
> >>>>
> >>>> And it seems that my previous assumption is still right for this case.
> >>>> If it's caused by kernel, your dump would definitely help us to locate
> >>>> the problem.
> >>>>
> >>>>> ERROR: root 4546 INODE REF[4222342 79177] and DIR_ITEM[79177 54846528] mismatch namelen 14 filename deprecated.txt filetype 1
> >>>>> ERROR: root 5134 DIR_ITEM[79177 54846528] relative INODE_REF missing namelen 14 filename deprecated.sxt filetype 1
> >>>>
> >>>> Also for root 5134 please.
> >>>>
> >>>> Thanks,
> >>>> Qu
> >>>>
> >>>>> ERROR: errors found in fs roots
> >>>>> Checking filesystem on /dev/sda2
> >>>>> UUID: 12c84aa3-ce65-4390-807e-a72cc8a7445e
> >>>>> found 153429872640 bytes used, error(s) found
> >>>>> total csum bytes: 121991672
> >>>>> total tree bytes: 1940160512
> >>>>> total fs tree bytes: 1683767296
> >>>>> total extent tree bytes: 103841792
> >>>>> btree space waste bytes: 310722480
> >>>>> file data blocks allocated: 842455031808
> >>>>>     referenced 159286636544
> >>>>>
> >>>>> In a letter from Wednesday, July 12, 2017 10:15:18 MSK user Qu Wenruo wrote:
> >>>>>> Sorry for the late reply.
> >>>>>>
> >>>>>> After investigating the dumps, I found the output is quite strange.
> >>>>>>
> >>>>>> 1) Mismatching output.
> >>>>>> In "btrfs-debug-tree-grep-79177.txt" I found only 79177 as offset for
> >>>>>> INODE_REF is here, while 79177 as objectid for DIR_ITEM/DIR_INDEX is not
> >>>>>> here at all.
> >>>>>>
> >>>>>> While in "btrfs-debug-tree-grep-deprecated-txt.txt" there is epected
> >>>>>> 79177 DIR_ITEM/DIR_INDEX.
> >>>>>>
> >>>>>> Maybe something wrong in grep happened which skip "(79177" ?
> >>>>>>
> >>>>>> 2) Mismatched hash
> >>>>>> The main problem I found is that, for key (79177 DIR_ITEM 54846528), the
> >>>>>> number 54846528 is the hash(crc32c) of filename, and it contains 2
> >>>>>> items, one for "deprecated.txt" and one for "deprecated.sxt".
> >>>>>>
> >>>>>> But we found that 54846528 only matches the hash for "deprecated.txt",
> >>>>>> not "deprecated.sxt".
> >>>>>>
> >>>>>> I think that's the main problem.
> >>>>>>
> >>>>>> BTW, would you please try "btrfs check --mode=lowmem" to see if lowmem
> >>>>>> mode reports similar (well, output may differ) error?
> >>>>>>
> >>>>>> If lowmem mode also reports error on such DIR_ITEM, I'm pretty sure
> >>>>>> that's the problem.
> >>>>>>
> >>>>>> However it may take some time before we can fix it in repair mode.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Qu
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> 在 2017年07月04日 21:24, Filippe LeMarchand 写道:
> >>>>>>> Sure, here it is:
> >>>>>>> https://drive.google.com/drive/folders/0B1ax9Am81gx9YjJBVVA0LXRHeGc
> >>>>>>>
> >>>>>>> In a letter dated Tuesday, July 4, 2017 16:16:36 MSK user Lu Fengqi wrote:
> >>>>>>>> On Mon, Jul 03, 2017 at 08:34:52AM +0800, Qu Wenruo wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> At 07/01/2017 07:59 PM, Filippe LeMarchand wrote:
> >>>>>>>>>> Hello everyone.
> >>>>>>>>>>
> >>>>>>>>>> I have an btrfs root partition on Intel 530 ssd, which mounts without errors and seem to work fine,
> >>>>>>>>>> but `btrfs check` gives me foloowing output (and --repair doesn't remove errors):
> >>>>>>>>>>
> >>>>>>>>>> enabling repair mode
> >>>>>>>>>> Checking filesystem on /dev/sda2
> >>>>>>>>>> UUID: 12c84aa3-ce65-4390-807e-a72cc8a7445e
> >>>>>>>>>> checking extents
> >>>>>>>>>> Fixed 0 roots.
> >>>>>>>>>> checking free space cache
> >>>>>>>>>> cache and super generation don't match, space cache will be invalidated
> >>>>>>>>>> checking fs roots
> >>>>>>>>>> 	unresolved ref dir 79177 index 0 namelen 14 name deprecated.sxt filetype 1 errors 6, no dir index, no inode ref
> >>>>>>>>>
> >>>>>>>>> This means that in dir whose inode number is 79177, it has a child inode
> >>>>>>>>> pointer pointing to depercated.sxt.
> >>>>>>>>>
> >>>>>>>>> But it doesn't have dir index and corresponding inode ref, which is breaking
> >>>>>>>>> the cross reference rule of btrfs.
> >>>>>>>>>
> >>>>>>>>> Would you please run the following command to dump needed info for us to
> >>>>>>>>> debug?
> >>>>>>>>>
> >>>>>>>>> # btrfs-debug-tree /dev/sda2 | grep 79177 -C 10
> >>>>>>>>>
> >>>>>>>>> and
> >>>>>>>>>
> >>>>>>>>> # btrfs-debug-tree /dev/sda2 | grep deprecated.sxt -C 10
> >>>>>>>>>
> >>>>>>>>> and
> >>>>>>>>>
> >>>>>>>>> # btrfs-debug-tree /dev/sda2 | grep deprecated.txt -C 10
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Considering the output has both .txt and .sxt, I think that's the problem.
> >>>>>>>>> But such bit-flip should be detected by tree block csum.
> >>>>>>>>> I'm not sure what's wrong with it.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Qu
> >>>>>>>>>
> >>>>>>>>>> 	unresolved ref dir 79177 index 417 namelen 14 name deprecated.txt filetype 1 errors 1, no dir item
> >>>>>>>>>> 	unresolved ref dir 79177 index 0 namelen 14 name deprecated.sxt filetype 1 errors 6, no dir index, no inode ref
> >>>>>>>>>> 	unresolved ref dir 79177 index 417 namelen 14 name deprecated.txt filetype 1 errors 1, no dir item
> >>>>>>>>>> 	unresolved ref dir 79177 index 0 namelen 14 name deprecated.sxt filetype 1 errors 6, no dir index, no inode ref
> >>>>>>>>>> 	unresolved ref dir 79177 index 417 namelen 14 name deprecated.txt filetype 1 errors 1, no dir item
> >>>>>>>>>> 	unresolved ref dir 79177 index 0 namelen 14 name deprecated.sxt filetype 1 errors 6, no dir index, no inode ref
> >>>>>>>>>> 	unresolved ref dir 79177 index 417 namelen 14 name deprecated.txt filetype 1 errors 1, no dir item
> >>>>>>>>>> 	unresolved ref dir 79177 index 0 namelen 14 name deprecated.sxt filetype 1 errors 6, no dir index, no inode ref
> >>>>>>>>>> 	unresolved ref dir 79177 index 417 namelen 14 name deprecated.txt filetype 1 errors 1, no dir item
> >>>>>>>>>> 	unresolved ref dir 79177 index 0 namelen 14 name deprecated.sxt filetype 1 errors 6, no dir index, no inode ref
> >>>>>>>>>> 	unresolved ref dir 79177 index 417 namelen 14 name deprecated.txt filetype 1 errors 1, no dir item
> >>>>>>>>>> 	unresolved ref dir 79177 index 0 namelen 14 name deprecated.sxt filetype 1 errors 6, no dir index, no inode ref
> >>>>>>>>>> 	unresolved ref dir 79177 index 417 namelen 14 name deprecated.txt filetype 1 errors 1, no dir item
> >>>>>>>>>> 	unresolved ref dir 79177 index 0 namelen 14 name deprecated.sxt filetype 1 errors 6, no dir index, no inode ref
> >>>>>>>>>> 	unresolved ref dir 79177 index 417 namelen 14 name deprecated.txt filetype 1 errors 1, no dir item
> >>>>>>>>>> 	unresolved ref dir 79177 index 0 namelen 14 name deprecated.sxt filetype 1 errors 6, no dir index, no inode ref
> >>>>>>>>>> 	unresolved ref dir 79177 index 417 namelen 14 name deprecated.txt filetype 1 errors 1, no dir item
> >>>>>>>>>> checking csums
> >>>>>>>>>> checking root refs
> >>>>>>>>>> found 23421812736 bytes used err is 0
> >>>>>>>>>> total csum bytes: 21531608
> >>>>>>>>>> total tree bytes: 776650752
> >>>>>>>>>> total fs tree bytes: 711278592
> >>>>>>>>>> total extent tree bytes: 36798464
> >>>>>>>>>> btree space waste bytes: 116002036
> >>>>>>>>>> file data blocks allocated: 850546470912
> >>>>>>>>>>       referenced 27611987968
> >>>>>>>>>>
> >>>>>>>>>> Is it dangerous and what should I do about it?
> >>>>>>>>>>
> >>>>>>>>>> I also tried --clear-space-cache, but it just removes the line about space cache.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> >>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>
> >>>>>>>> I'm afraid that your mail may be rejected because the attachment size
> >>>>>>>> exceeds the allowable limit(100kB) of btrfs mailing list. Could you
> >>>>>>>> share the attachment by google drive?
> >>>>>>>>
> >>>>>>>> Lastly, while Qu's timing is too tight, I will assist you on this issue.
> >>>>>>>>
> 

Attachment: smime.p7s
Description: S/MIME cryptographic signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux