Re: Spurious "ghost" "parent transid verify failed" messages on 5.0.21 - with call traces

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 01, 2019 at 01:56:08PM +0800, Qu Wenruo wrote:
> 
> 
> On 2019/7/1 上午11:39, Zygo Blaxell wrote:
> > On Wed, Apr 03, 2019 at 10:47:16AM -0400, Zygo Blaxell wrote:
> >> On Tue, Mar 12, 2019 at 12:00:25AM -0400, Zygo Blaxell wrote:
> >>> On 4.14.x and 4.20.14 kernels (probably all the ones in between too,
> >>> but I haven't tested those), I get what I call "ghost parent transid
> >>> verify failed" errors.  Here's an unedited recent example from dmesg:
> >>>
> >>> 	[16180.649285] BTRFS error (device dm-3): parent transid verify failed on 1218181971968 wanted 9698 found 9744
> >>
> >> These happen much less often on 5.0.x, but they still happen from time
> >> to time.
> > 
> > I put this patch in 5.0.21:
> > 
> > 	commit 5abbed1af5570f1317f31736e3862e8b7df1ca8b
> > 	Author: Zygo Blaxell <ce3g8jdj@xxxxxxxxxxxxxxxxxxxxx>
> > 	Date:   Sat May 18 17:48:59 2019 -0400
> > 
> > 	    btrfs: get a call trace when we hit ghost parent transid verify failures
> > 
> > 	diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > 	index 6fe9197f6ee4..ed961d2915a1 100644
> > 	--- a/fs/btrfs/disk-io.c
> > 	+++ b/fs/btrfs/disk-io.c
> > 	@@ -356,6 +356,7 @@ static int verify_parent_transid(struct extent_io_tree *io_tree,
> > 			"parent transid verify failed on %llu wanted %llu found %llu",
> > 				eb->start,
> > 				parent_transid, btrfs_header_generation(eb));
> > 	+               WARN_ON(1);
> > 		ret = 1;
> > 	 
> > 		/*
> > 
> > and eventually (six weeks later!) got another reproduction of this bug
> > on 5.0.21:
> > 
> [snip]
> > 
> > which confirms the event comes from the LOGICAL_INO ioctl, at least.
> > I had suspected that before based on timing and event log correlations,
> > but now I have stack traces.
> > 
> > It looks like insufficient locking, i.e. the eb got modified while
> > LOGICAL_INO was looking at it.
> 
> For this case, a quick dirty fix would be try to joining a transaction
> (if the fs is not RO) and hold the trans handler to block current
> transaction from being committed.

Do you mean, revert "bfc61c36260c Btrfs: do not start a transaction at
iterate_extent_inodes()"?  Or something else?

I've had the spurious parent transid verify failures since at least 4.14,
years before that patch.

> This is definitely going to impact performance but at least should avoid
> such transid mismatch call.
> 
> In theory it should also affect any backref lookup not protected, like
> subvolume aware defrag.
> 
> Thanks,
> Qu
> 
> > 
> > As usual for the "ghost" parent transid verify failure, there's no
> > persistent failure, no error reported to applications, and error counts
> > in 'btrfs dev stats' are not incremented.
> > 
> 



Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux