On 2019/7/1 上午11:39, Zygo Blaxell wrote: > On Wed, Apr 03, 2019 at 10:47:16AM -0400, Zygo Blaxell wrote: >> On Tue, Mar 12, 2019 at 12:00:25AM -0400, Zygo Blaxell wrote: >>> On 4.14.x and 4.20.14 kernels (probably all the ones in between too, >>> but I haven't tested those), I get what I call "ghost parent transid >>> verify failed" errors. Here's an unedited recent example from dmesg: >>> >>> [16180.649285] BTRFS error (device dm-3): parent transid verify failed on 1218181971968 wanted 9698 found 9744 >> >> These happen much less often on 5.0.x, but they still happen from time >> to time. > > I put this patch in 5.0.21: > > commit 5abbed1af5570f1317f31736e3862e8b7df1ca8b > Author: Zygo Blaxell <ce3g8jdj@xxxxxxxxxxxxxxxxxxxxx> > Date: Sat May 18 17:48:59 2019 -0400 > > btrfs: get a call trace when we hit ghost parent transid verify failures > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 6fe9197f6ee4..ed961d2915a1 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -356,6 +356,7 @@ static int verify_parent_transid(struct extent_io_tree *io_tree, > "parent transid verify failed on %llu wanted %llu found %llu", > eb->start, > parent_transid, btrfs_header_generation(eb)); > + WARN_ON(1); > ret = 1; > > /* > > and eventually (six weeks later!) got another reproduction of this bug > on 5.0.21: > [snip] > > which confirms the event comes from the LOGICAL_INO ioctl, at least. > I had suspected that before based on timing and event log correlations, > but now I have stack traces. > > It looks like insufficient locking, i.e. the eb got modified while > LOGICAL_INO was looking at it. For this case, a quick dirty fix would be try to joining a transaction (if the fs is not RO) and hold the trans handler to block current transaction from being committed. This is definitely going to impact performance but at least should avoid such transid mismatch call. In theory it should also affect any backref lookup not protected, like subvolume aware defrag. Thanks, Qu > > As usual for the "ghost" parent transid verify failure, there's no > persistent failure, no error reported to applications, and error counts > in 'btrfs dev stats' are not incremented. >
Attachment:
signature.asc
Description: OpenPGP digital signature
