Re: Spurious "ghost" "parent transid verify failed" messages on 5.0.21 - with call traces

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2019/7/1 上午11:39, Zygo Blaxell wrote:
> On Wed, Apr 03, 2019 at 10:47:16AM -0400, Zygo Blaxell wrote:
>> On Tue, Mar 12, 2019 at 12:00:25AM -0400, Zygo Blaxell wrote:
>>> On 4.14.x and 4.20.14 kernels (probably all the ones in between too,
>>> but I haven't tested those), I get what I call "ghost parent transid
>>> verify failed" errors.  Here's an unedited recent example from dmesg:
>>>
>>> 	[16180.649285] BTRFS error (device dm-3): parent transid verify failed on 1218181971968 wanted 9698 found 9744
>>
>> These happen much less often on 5.0.x, but they still happen from time
>> to time.
> 
> I put this patch in 5.0.21:
> 
> 	commit 5abbed1af5570f1317f31736e3862e8b7df1ca8b
> 	Author: Zygo Blaxell <ce3g8jdj@xxxxxxxxxxxxxxxxxxxxx>
> 	Date:   Sat May 18 17:48:59 2019 -0400
> 
> 	    btrfs: get a call trace when we hit ghost parent transid verify failures
> 
> 	diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> 	index 6fe9197f6ee4..ed961d2915a1 100644
> 	--- a/fs/btrfs/disk-io.c
> 	+++ b/fs/btrfs/disk-io.c
> 	@@ -356,6 +356,7 @@ static int verify_parent_transid(struct extent_io_tree *io_tree,
> 			"parent transid verify failed on %llu wanted %llu found %llu",
> 				eb->start,
> 				parent_transid, btrfs_header_generation(eb));
> 	+               WARN_ON(1);
> 		ret = 1;
> 	 
> 		/*
> 
> and eventually (six weeks later!) got another reproduction of this bug
> on 5.0.21:
> 
[snip]
> 
> which confirms the event comes from the LOGICAL_INO ioctl, at least.
> I had suspected that before based on timing and event log correlations,
> but now I have stack traces.
> 
> It looks like insufficient locking, i.e. the eb got modified while
> LOGICAL_INO was looking at it.

For this case, a quick dirty fix would be try to joining a transaction
(if the fs is not RO) and hold the trans handler to block current
transaction from being committed.

This is definitely going to impact performance but at least should avoid
such transid mismatch call.

In theory it should also affect any backref lookup not protected, like
subvolume aware defrag.

Thanks,
Qu

> 
> As usual for the "ghost" parent transid verify failure, there's no
> persistent failure, no error reported to applications, and error counts
> in 'btrfs dev stats' are not incremented.
> 

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux