Re: Spurious "ghost" "parent transid verify failed" messages on 5.0.21 - with call traces - Fixed in 47c8495d358b (5.4.18/5.5.2?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jun 30, 2019 at 11:39:25PM -0400, Zygo Blaxell wrote:
> On Wed, Apr 03, 2019 at 10:47:16AM -0400, Zygo Blaxell wrote:
> > On Tue, Mar 12, 2019 at 12:00:25AM -0400, Zygo Blaxell wrote:
> > > On 4.14.x and 4.20.14 kernels (probably all the ones in between too,
> > > but I haven't tested those), I get what I call "ghost parent transid
> > > verify failed" errors.  Here's an unedited recent example from dmesg:
> > > 
> > > 	[16180.649285] BTRFS error (device dm-3): parent transid verify failed on 1218181971968 wanted 9698 found 9744
> > 
> > These happen much less often on 5.0.x, but they still happen from time
> > to time.
> 
> I put this patch in 5.0.21:
> 
> 	commit 5abbed1af5570f1317f31736e3862e8b7df1ca8b
> 	Author: Zygo Blaxell <ce3g8jdj@xxxxxxxxxxxxxxxxxxxxx>
> 	Date:   Sat May 18 17:48:59 2019 -0400
> 
> 	    btrfs: get a call trace when we hit ghost parent transid verify failures
> 
> 	diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> 	index 6fe9197f6ee4..ed961d2915a1 100644
> 	--- a/fs/btrfs/disk-io.c
> 	+++ b/fs/btrfs/disk-io.c
> 	@@ -356,6 +356,7 @@ static int verify_parent_transid(struct extent_io_tree *io_tree,
> 			"parent transid verify failed on %llu wanted %llu found %llu",
> 				eb->start,
> 				parent_transid, btrfs_header_generation(eb));
> 	+               WARN_ON(1);
> 		ret = 1;
> 	 
> 		/*
> 
> and eventually (six weeks later!) got another reproduction of this bug
> on 5.0.21:
> 
> 	[Sat Jun 29 16:03:51 2019] BTRFS error (device dm-24): parent transid verify failed on 192792461312 wanted 425102 found 425088
> 	[Sat Jun 29 16:03:51 2019] WARNING: CPU: 1 PID: 308 at fs/btrfs/disk-io.c:359 verify_parent_transid+0xf7/0x1e0
> 	[Sat Jun 29 16:03:51 2019] Modules linked in: isofs cpuid nfsv3 nfs fscache algif_skcipher af_alg mq_deadline softdog nfsd auth_rpcgss nfs_acl lockd grace sunrpc bnep cpufreq_userspace cpufreq_powersave cpufreq_conservative nfnetlink_queue nfnetlink_log nfnetlink bluetooth ecdh_generic rfkill snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_oss snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device binfmt_misc fuse nbd xt_REDIRECT ipt_REJECT nf_reject_ipv4 xt_nat xt_conntrack xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG ip6table_nat nf_nat_ipv6 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_mangle iptable_mangle ip6table_filter ip6_tables iptable_filter ip_tables x_tables bpfilter tcp_cubic dummy ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi lp dm_crypt ppdev edac_mce_amd kvm_amd ccp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_via snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi aesni_intel
> 	[Sat Jun 29 16:03:51 2019]  aes_x86_64 crypto_simd cryptd glue_helper wmi_bmof serio_raw pcspkr k10temp fam15h_power usbnet input_leds mii joydev snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm_oss radeon parport_pc snd_mixer_oss asus_atk0110 parport rtc_cmos snd_pcm snd_timer pcc_cpufreq snd soundcore evdev acpi_cpufreq sg sp5100_tco af_packet ipv6 crc_ccitt raid10 raid0 multipath linear hid_generic dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx dm_mod crc32c_intel ohci_pci raid1 psmouse sr_mod md_mod cdrom pdc202xx_new ide_pci_generic atiixp ide_core i2c_piix4 ohci_hcd ehci_pci ehci_hcd r8169 xhci_pci realtek xhci_hcd wmi [last unloaded: ax88179_178a]
> 	[Sat Jun 29 16:03:51 2019] CPU: 1 PID: 308 Comm: crawl_257 Tainted: G        W         5.0.21-zb64-45132582ab2c+ #1
> 	[Sat Jun 29 16:03:51 2019] Hardware name: System manufacturer System Product Name/M5A78L-M/USB3, BIOS 2101    12/02/2014
> 	[Sat Jun 29 16:03:51 2019] RIP: 0010:verify_parent_transid+0xf7/0x1e0
> 	[Sat Jun 29 16:03:51 2019] Code: 48 c1 f8 06 48 c1 e0 0c 4c 3b 74 10 50 74 2f 48 c7 c6 b0 5c 25 89 48 c7 c7 60 5d 90 89 e8 d1 5a 8f 00 85 c0 0f 85 a8 00 00 00 <0f> 0b 48 89 df 41 bc 01 00 00 00 e8 69 3f 03 00 85 c0 74 59 48 8b
> 	[Sat Jun 29 16:03:51 2019] RSP: 0018:ffffaf3c8d047730 EFLAGS: 00010296
> 	[Sat Jun 29 16:03:51 2019] RAX: 0000000000000000 RBX: ffff9c7ab0654ae0 RCX: 0000000000000000
> 	[Sat Jun 29 16:03:51 2019] RDX: 0000000000000000 RSI: ffff9c7fbc5e7cb8 RDI: ffff9c7fbc5e7cb8
> 	[Sat Jun 29 16:03:51 2019] RBP: ffffaf3c8d047770 R08: ffffffff8812a25f R09: 0000000000000001
> 	[Sat Jun 29 16:03:51 2019] R10: ffffaf3c8d047670 R11: 0000000000000000 R12: 0000000000000000
> 	[Sat Jun 29 16:03:51 2019] R13: ffff9c7f5b4ac450 R14: 0000000000067c8e R15: ffffaf3c8d047738
> 	[Sat Jun 29 16:03:51 2019] FS:  00007f53e7b74700(0000) GS:ffff9c7fbc400000(0000) knlGS:0000000000000000
> 	[Sat Jun 29 16:03:51 2019] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 	[Sat Jun 29 16:03:51 2019] CR2: 000001df74da3000 CR3: 000000021bf52000 CR4: 00000000000406e0
> 	[Sat Jun 29 16:03:51 2019] Call Trace:
> 	[Sat Jun 29 16:03:51 2019]  btree_read_extent_buffer_pages+0xbe/0x120
> 	[Sat Jun 29 16:03:51 2019]  read_tree_block+0x42/0x70
> 	[Sat Jun 29 16:03:51 2019]  read_block_for_search.isra.11+0x1ae/0x370
> 	[Sat Jun 29 16:03:51 2019]  btrfs_next_old_leaf+0x236/0x420
> 	[Sat Jun 29 16:03:51 2019]  resolve_indirect_refs+0x486/0x910
> 	[Sat Jun 29 16:03:51 2019]  find_parent_nodes+0x443/0x1340
> 	[Sat Jun 29 16:03:51 2019]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
> 	[Sat Jun 29 16:03:51 2019]  iterate_extent_inodes+0xfb/0x3e0
> 	[Sat Jun 29 16:03:51 2019]  ? _raw_spin_unlock+0x27/0x40
> 	[Sat Jun 29 16:03:51 2019]  iterate_inodes_from_logical+0xa1/0xd0
> 	[Sat Jun 29 16:03:51 2019]  ? iterate_inodes_from_logical+0xa1/0xd0
> 	[Sat Jun 29 16:03:51 2019]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
> 	[Sat Jun 29 16:03:51 2019]  btrfs_ioctl_logical_to_ino+0xf3/0x1a0
> 	[Sat Jun 29 16:03:51 2019]  btrfs_ioctl+0xcf7/0x2cb0
> 	[Sat Jun 29 16:03:51 2019]  ? lock_acquire+0xbd/0x1c0
> 	[Sat Jun 29 16:03:51 2019]  ? lock_acquire+0xbd/0x1c0
> 	[Sat Jun 29 16:03:51 2019]  do_vfs_ioctl+0xa6/0x6e0
> 	[Sat Jun 29 16:03:51 2019]  ? do_vfs_ioctl+0xa6/0x6e0
> 	[Sat Jun 29 16:03:51 2019]  ? __fget+0x119/0x200
> 	[Sat Jun 29 16:03:51 2019]  ksys_ioctl+0x75/0x80
> 	[Sat Jun 29 16:03:51 2019]  __x64_sys_ioctl+0x1a/0x20
> 	[Sat Jun 29 16:03:51 2019]  do_syscall_64+0x65/0x1a0
> 	[Sat Jun 29 16:03:51 2019]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 	[Sat Jun 29 16:03:51 2019] RIP: 0033:0x7f53e9468427
> 	[Sat Jun 29 16:03:51 2019] Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 f7 d8 64 89 01 48
> 	[Sat Jun 29 16:03:51 2019] RSP: 002b:00007f53e7b707c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> 	[Sat Jun 29 16:03:51 2019] RAX: ffffffffffffffda RBX: 00007f53e7b70af0 RCX: 00007f53e9468427
> 	[Sat Jun 29 16:03:51 2019] RDX: 00007f53e7b70af8 RSI: 00000000c038943b RDI: 0000000000000004
> 	[Sat Jun 29 16:03:51 2019] RBP: 0000000001000000 R08: 0000000000000000 R09: 00007f53e7b70cd0
> 	[Sat Jun 29 16:03:51 2019] R10: 0000559f3bf9cc40 R11: 0000000000000246 R12: 0000000000000004
> 	[Sat Jun 29 16:03:51 2019] R13: 00007f53e7b70af8 R14: 00007f53e7b709d8 R15: 00007f53e7b70c00
> 	[Sat Jun 29 16:03:51 2019] irq event stamp: 1551156398
> 	[Sat Jun 29 16:03:51 2019] hardirqs last  enabled at (1551156397): [<ffffffff880037da>] trace_hardirqs_on_thunk+0x1a/0x1c
> 	[Sat Jun 29 16:03:51 2019] hardirqs last disabled at (1551156398): [<ffffffff88dc51f9>] __schedule+0xd9/0xb70
> 	[Sat Jun 29 16:03:51 2019] softirqs last  enabled at (1551156396): [<ffffffff890003a0>] __do_softirq+0x3a0/0x45a
> 	[Sat Jun 29 16:03:51 2019] softirqs last disabled at (1551156389): [<ffffffff880a99a9>] irq_exit+0xe9/0xf0
> 	[Sat Jun 29 16:03:51 2019] ---[ end trace d600eda2e1469b36 ]---
> 	[Sat Jun 29 16:03:51 2019] BTRFS error (device dm-24): parent transid verify failed on 192792461312 wanted 425102 found 425088
> 	[Sat Jun 29 16:03:51 2019] WARNING: CPU: 2 PID: 308 at fs/btrfs/disk-io.c:359 verify_parent_transid+0xf7/0x1e0
> 	[Sat Jun 29 16:03:51 2019] Modules linked in: isofs cpuid nfsv3 nfs fscache algif_skcipher af_alg mq_deadline softdog nfsd auth_rpcgss nfs_acl lockd grace sunrpc bnep cpufreq_userspace cpufreq_powersave cpufreq_conservative nfnetlink_queue nfnetlink_log nfnetlink bluetooth ecdh_generic rfkill snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_oss snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device binfmt_misc fuse nbd xt_REDIRECT ipt_REJECT nf_reject_ipv4 xt_nat xt_conntrack xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG ip6table_nat nf_nat_ipv6 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_mangle iptable_mangle ip6table_filter ip6_tables iptable_filter ip_tables x_tables bpfilter tcp_cubic dummy ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi lp dm_crypt ppdev edac_mce_amd kvm_amd ccp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_via snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi aesni_intel
> 	[Sat Jun 29 16:03:51 2019]  aes_x86_64 crypto_simd cryptd glue_helper wmi_bmof serio_raw pcspkr k10temp fam15h_power usbnet input_leds mii joydev snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm_oss radeon parport_pc snd_mixer_oss asus_atk0110 parport rtc_cmos snd_pcm snd_timer pcc_cpufreq snd soundcore evdev acpi_cpufreq sg sp5100_tco af_packet ipv6 crc_ccitt raid10 raid0 multipath linear hid_generic dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx dm_mod crc32c_intel ohci_pci raid1 psmouse sr_mod md_mod cdrom pdc202xx_new ide_pci_generic atiixp ide_core i2c_piix4 ohci_hcd ehci_pci ehci_hcd r8169 xhci_pci realtek xhci_hcd wmi [last unloaded: ax88179_178a]
> 	[Sat Jun 29 16:03:51 2019] CPU: 2 PID: 308 Comm: crawl_257 Tainted: G        W         5.0.21-zb64-45132582ab2c+ #1
> 	[Sat Jun 29 16:03:52 2019] Hardware name: System manufacturer System Product Name/M5A78L-M/USB3, BIOS 2101    12/02/2014
> 	[Sat Jun 29 16:03:52 2019] RIP: 0010:verify_parent_transid+0xf7/0x1e0
> 	[Sat Jun 29 16:03:52 2019] Code: 48 c1 f8 06 48 c1 e0 0c 4c 3b 74 10 50 74 2f 48 c7 c6 b0 5c 25 89 48 c7 c7 60 5d 90 89 e8 d1 5a 8f 00 85 c0 0f 85 a8 00 00 00 <0f> 0b 48 89 df 41 bc 01 00 00 00 e8 69 3f 03 00 85 c0 74 59 48 8b
> 	[Sat Jun 29 16:03:52 2019] RSP: 0018:ffffaf3c8d047730 EFLAGS: 00010296
> 	[Sat Jun 29 16:03:52 2019] RAX: 0000000000000000 RBX: ffff9c7ab0654ae0 RCX: 0000000000000000
> 	[Sat Jun 29 16:03:52 2019] RDX: 0000000000000000 RSI: ffff9c7fbc9e7cb8 RDI: ffff9c7fbc9e7cb8
> 	[Sat Jun 29 16:03:52 2019] RBP: ffffaf3c8d047770 R08: ffffffff8812a25f R09: 0000000000000001
> 	[Sat Jun 29 16:03:52 2019] R10: ffffaf3c8d047670 R11: 0000000000000000 R12: 0000000000000000
> 	[Sat Jun 29 16:03:52 2019] R13: ffff9c7f5b4ac450 R14: 0000000000067c8e R15: ffffaf3c8d047738
> 	[Sat Jun 29 16:03:52 2019] FS:  00007f53e7b74700(0000) GS:ffff9c7fbc800000(0000) knlGS:0000000000000000
> 	[Sat Jun 29 16:03:52 2019] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 	[Sat Jun 29 16:03:52 2019] CR2: 00000000f7e262c0 CR3: 000000021bf52000 CR4: 00000000000406e0
> 	[Sat Jun 29 16:03:52 2019] Call Trace:
> 	[Sat Jun 29 16:03:52 2019]  btree_read_extent_buffer_pages+0xbe/0x120
> 	[Sat Jun 29 16:03:52 2019]  read_tree_block+0x42/0x70
> 	[Sat Jun 29 16:03:52 2019]  read_block_for_search.isra.11+0x1ae/0x370
> 	[Sat Jun 29 16:03:52 2019]  btrfs_next_old_leaf+0x236/0x420
> 	[Sat Jun 29 16:03:52 2019]  resolve_indirect_refs+0x486/0x910
> 	[Sat Jun 29 16:03:52 2019]  find_parent_nodes+0x443/0x1340
> 	[Sat Jun 29 16:03:52 2019]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
> 	[Sat Jun 29 16:03:52 2019]  iterate_extent_inodes+0xfb/0x3e0
> 	[Sat Jun 29 16:03:52 2019]  ? _raw_spin_unlock+0x27/0x40
> 	[Sat Jun 29 16:03:52 2019]  iterate_inodes_from_logical+0xa1/0xd0
> 	[Sat Jun 29 16:03:52 2019]  ? iterate_inodes_from_logical+0xa1/0xd0
> 	[Sat Jun 29 16:03:52 2019]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
> 	[Sat Jun 29 16:03:52 2019]  btrfs_ioctl_logical_to_ino+0xf3/0x1a0
> 	[Sat Jun 29 16:03:52 2019]  btrfs_ioctl+0xcf7/0x2cb0
> 	[Sat Jun 29 16:03:52 2019]  ? lock_acquire+0xbd/0x1c0
> 	[Sat Jun 29 16:03:52 2019]  ? lock_acquire+0xbd/0x1c0
> 	[Sat Jun 29 16:03:52 2019]  do_vfs_ioctl+0xa6/0x6e0
> 	[Sat Jun 29 16:03:52 2019]  ? do_vfs_ioctl+0xa6/0x6e0
> 	[Sat Jun 29 16:03:52 2019]  ? __fget+0x119/0x200
> 	[Sat Jun 29 16:03:52 2019]  ksys_ioctl+0x75/0x80
> 	[Sat Jun 29 16:03:52 2019]  __x64_sys_ioctl+0x1a/0x20
> 	[Sat Jun 29 16:03:52 2019]  do_syscall_64+0x65/0x1a0
> 	[Sat Jun 29 16:03:52 2019]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 	[Sat Jun 29 16:03:52 2019] RIP: 0033:0x7f53e9468427
> 	[Sat Jun 29 16:03:52 2019] Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 f7 d8 64 89 01 48
> 	[Sat Jun 29 16:03:52 2019] RSP: 002b:00007f53e7b707c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> 	[Sat Jun 29 16:03:52 2019] RAX: ffffffffffffffda RBX: 00007f53e7b70af0 RCX: 00007f53e9468427
> 	[Sat Jun 29 16:03:52 2019] RDX: 00007f53e7b70af8 RSI: 00000000c038943b RDI: 0000000000000004
> 	[Sat Jun 29 16:03:52 2019] RBP: 0000000001000000 R08: 0000000000000000 R09: 00007f53e7b70cd0
> 	[Sat Jun 29 16:03:52 2019] R10: 0000559f3bf9cc40 R11: 0000000000000246 R12: 0000000000000004
> 	[Sat Jun 29 16:03:52 2019] R13: 00007f53e7b70af8 R14: 00007f53e7b709d8 R15: 00007f53e7b70c00
> 	[Sat Jun 29 16:03:52 2019] irq event stamp: 1551156398
> 	[Sat Jun 29 16:03:52 2019] hardirqs last  enabled at (1551156397): [<ffffffff880037da>] trace_hardirqs_on_thunk+0x1a/0x1c
> 	[Sat Jun 29 16:03:52 2019] hardirqs last disabled at (1551156398): [<ffffffff88dc51f9>] __schedule+0xd9/0xb70
> 	[Sat Jun 29 16:03:52 2019] softirqs last  enabled at (1551156396): [<ffffffff890003a0>] __do_softirq+0x3a0/0x45a
> 	[Sat Jun 29 16:03:52 2019] softirqs last disabled at (1551156389): [<ffffffff880a99a9>] irq_exit+0xe9/0xf0
> 	[Sat Jun 29 16:03:52 2019] ---[ end trace d600eda2e1469b37 ]---
> 
> which confirms the event comes from the LOGICAL_INO ioctl, at least.
> I had suspected that before based on timing and event log correlations,
> but now I have stack traces.
> 
> It looks like insufficient locking, i.e. the eb got modified while
> LOGICAL_INO was looking at it.
> 
> As usual for the "ghost" parent transid verify failure, there's no
> persistent failure, no error reported to applications, and error counts
> in 'btrfs dev stats' are not incremented.

I'm pretty sure this is fixed by the third tree mod log patch, although
it's difficult to test without also applying the first two:

	6609fee8897a Btrfs: fix removal logic of the tree mod log that leads to use-after-free issues
	efad8a853ad2 Btrfs: fix use-after-free when using the tree modification log
	47c8495d358b Btrfs: fix race between adding and putting tree mod seq elements and nodes

The third patch fixes insufficient locking in LOGICAL_INO, FIEMAP,
balance, resize, scrub error reporting, and anything else that iterates
over extent references.

With all three patches applied I can no longer produce ghost PTV
failures on any kernel 4.20..5.5 under my usual stress test (rsync,
bees, snapshots, scrub, balance).  Without the third patch, I get a few
PTV failures per hour on the most recent version of my stress test,
as well as other usually-fatal errors like "unable to find logical 0
length 4096" and even the occasional "bad tree block", all of which have
no apparent effect other than emitting the kernel log messages.

Also, the affected kernel range checks out:  I've observed the ghost
PTV failures on kernels older than 4.14, and the tree mod log patches
fix bugs introduced in 3.10.

Thanks Filipe for all three patches.

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux