On 2019/7/3 下午12:32, Zygo Blaxell wrote: > On Mon, Jul 01, 2019 at 01:56:08PM +0800, Qu Wenruo wrote: >> >> >> On 2019/7/1 上午11:39, Zygo Blaxell wrote: >>> On Wed, Apr 03, 2019 at 10:47:16AM -0400, Zygo Blaxell wrote: >>>> On Tue, Mar 12, 2019 at 12:00:25AM -0400, Zygo Blaxell wrote: >>>>> On 4.14.x and 4.20.14 kernels (probably all the ones in between too, >>>>> but I haven't tested those), I get what I call "ghost parent transid >>>>> verify failed" errors. Here's an unedited recent example from dmesg: >>>>> >>>>> [16180.649285] BTRFS error (device dm-3): parent transid verify failed on 1218181971968 wanted 9698 found 9744 >>>> >>>> These happen much less often on 5.0.x, but they still happen from time >>>> to time. >>> >>> I put this patch in 5.0.21: >>> >>> commit 5abbed1af5570f1317f31736e3862e8b7df1ca8b >>> Author: Zygo Blaxell <ce3g8jdj@xxxxxxxxxxxxxxxxxxxxx> >>> Date: Sat May 18 17:48:59 2019 -0400 >>> >>> btrfs: get a call trace when we hit ghost parent transid verify failures >>> >>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >>> index 6fe9197f6ee4..ed961d2915a1 100644 >>> --- a/fs/btrfs/disk-io.c >>> +++ b/fs/btrfs/disk-io.c >>> @@ -356,6 +356,7 @@ static int verify_parent_transid(struct extent_io_tree *io_tree, >>> "parent transid verify failed on %llu wanted %llu found %llu", >>> eb->start, >>> parent_transid, btrfs_header_generation(eb)); >>> + WARN_ON(1); >>> ret = 1; >>> >>> /* >>> >>> and eventually (six weeks later!) got another reproduction of this bug >>> on 5.0.21: >>> >> [snip] >>> >>> which confirms the event comes from the LOGICAL_INO ioctl, at least. >>> I had suspected that before based on timing and event log correlations, >>> but now I have stack traces. >>> >>> It looks like insufficient locking, i.e. the eb got modified while >>> LOGICAL_INO was looking at it. >> >> For this case, a quick dirty fix would be try to joining a transaction >> (if the fs is not RO) and hold the trans handler to block current >> transaction from being committed. > > Do you mean, revert "bfc61c36260c Btrfs: do not start a transaction at > iterate_extent_inodes()"? Or something else? > > I've had the spurious parent transid verify failures since at least 4.14, > years before that patch. I mean even longer trans protection. E.g. start a trans just before calling iterate_inodes_from_logical(), and end it after iterate_inodes_from_logical() call. Thanks, Qu > >> This is definitely going to impact performance but at least should avoid >> such transid mismatch call. >> >> In theory it should also affect any backref lookup not protected, like >> subvolume aware defrag. >> >> Thanks, >> Qu >> >>> >>> As usual for the "ghost" parent transid verify failure, there's no >>> persistent failure, no error reported to applications, and error counts >>> in 'btrfs dev stats' are not incremented. >>> >> > > >
Attachment:
signature.asc
Description: OpenPGP digital signature
