Re: btrfs list corruption and soft lockups while testing writeback error handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2017-05-11 at 07:13 -0400, Jeff Layton wrote:
> I finally got my writeback error handling test to work on btrfs (thanks,
> Chris!), by making the filesystem stripe the data and mirror the
> metadata across two devices. The test passes now, but on one run, I got
> the following list corruption warning and then a soft lockup (which is
> probably fallout from the list corruption).
> 
> I ran the test several times before and since then without this failure,
> so I don't have a clear reproducer. The kernel in this instance is
> basically a v4.11 kernel with my pile of writeback error handling
> patches on top:
> 
>     https://git.samba.org/?p=jlayton/linux.git;a=shortlog;h=refs/heads/wberr
> 
> It may be that they are a contributing factor, but this smells more like
> a bug down in btrfs. Let me know if you need other info:
> 
> ------------------------------8<-----------------------------------
> 
> [  438.341942] run fstests generic/999 at 2017-05-11 07:03:39
> [  439.453293] BTRFS: device fsid 08e9b22b-44a1-4954-a1b0-03c7c0537831 devid 1 transid 3 /dev/vda8
> [  439.465918] BTRFS: device fsid 08e9b22b-44a1-4954-a1b0-03c7c0537831 devid 2 transid 3 /dev/vda7
> [  439.603578] device-mapper: ioctl: device doesn't appear to be in the dev hash table.
> [  439.762422] BTRFS info (device dm-4): disk space caching is enabled
> [  439.763808] BTRFS info (device dm-4): has skinny extents
> [  439.764979] BTRFS info (device dm-4): flagging fs with big metadata feature
> [  439.785879] BTRFS info (device dm-4): creating UUID tree
> [  439.974266] BTRFS info (device dm-4): disk space caching is enabled
> [  439.975783] BTRFS info (device dm-4): has skinny extents
> [  440.229263] Buffer I/O error on dev dm-4, logical block 2621424, async page read
> [  440.239970] BTRFS error (device dm-4): bdev /dev/mapper/error-test errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
> [  440.242459] ------------[ cut here ]------------
> [  440.243276] WARNING: CPU: 0 PID: 5162 at lib/list_debug.c:28 __list_add_valid+0x69/0xa0
> [  440.244338] list_add corruption. prev->next should be next (ffff8dd531056b08), but was ffffa93242807e90. (prev=ffffa93242807e90).
> [  440.245939] Modules linked in: btrfs xor raid6_pq binfmt_misc ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_broute bridge stp llc ebtable_nat ip6table_mangle ip6table_security ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw iptable_mangle iptable_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core crct10dif_pclmul crc32_pclmul nfsd ghash_clmulni_intel ppdev snd_hwdep snd_pcm acpi_cpufreq snd_timer tpm_tis snd parport_pc tpm_tis_core parport pcspkr tpm i2c_piix4 auth_rpcgss soundcore floppy joydev qemu_fw_cfg virtio_balloon nfs_acl lockd grace sunrpc xfs libcrc32c qxl drm_kms_helper virtio_net
> [  440.254739]  virtio_blk virtio_console virtio_rng ttm drm crc32c_intel virtio_pci virtio_ring ata_generic virtio serio_raw pata_acpi
> [  440.256352] CPU: 0 PID: 5162 Comm: fsync-err Not tainted 4.11.0+ #52
> [  440.257534] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
> [  440.258584] Call Trace:
> [  440.259096]  dump_stack+0x63/0x86
> [  440.259618]  __warn+0xcb/0xf0
> [  440.260116]  warn_slowpath_fmt+0x5a/0x80
> [  440.260798]  ? check_parent_dirs_for_sync+0x109/0x140 [btrfs]
> [  440.261755]  __list_add_valid+0x69/0xa0
> [  440.262442]  btrfs_log_inode_parent+0x25c/0x9f0 [btrfs]
> [  440.263323]  ? btrfs_releasepage+0x20/0x20 [btrfs]
> [  440.264059]  ? wait_current_trans+0x2e/0xf0 [btrfs]
> [  440.264792]  ? kmem_cache_alloc+0x195/0x1b0
> [  440.265455]  ? join_transaction+0x27/0x420 [btrfs]
> [  440.266175]  btrfs_log_dentry_safe+0x60/0x80 [btrfs]
> [  440.266965]  btrfs_sync_file+0x2b7/0x400 [btrfs]
> [  440.267655]  vfs_fsync_range+0x49/0xb0
> [  440.268266]  do_fsync+0x3d/0x70
> [  440.268806]  SyS_fsync+0x10/0x20
> [  440.269347]  entry_SYSCALL_64_fastpath+0x1a/0xa9
> [  440.270033] RIP: 0033:0x7f7983af1b70
> [  440.270607] RSP: 002b:00007ffe13b3aa18 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
> [  440.271661] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f7983af1b70
> [  440.272623] RDX: 0000000000010401 RSI: 00000000023ef030 RDI: 0000000000000004
> [  440.273696] RBP: 00007ffe13b3cc77 R08: 0000000000000000 R09: 00007f79845364c8
> [  440.274808] R10: 0000000000000008 R11: 0000000000000246 R12: 00000000023ef030
> [  440.275873] R13: 0000000000000005 R14: 0000000000000000 R15: 0000000000000000
> [  440.276995] ---[ end trace 878ee9789ed2d63b ]---
> [  440.278476] BTRFS error (device dm-4): bdev /dev/mapper/error-test errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
> [  440.282362] BTRFS error (device dm-4): bdev /dev/mapper/error-test errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
> [  440.300180] BTRFS warning (device dm-4): lost page write due to IO error on /dev/mapper/error-test
> [  440.301502] BTRFS error (device dm-4): bdev /dev/mapper/error-test errs: wr 4, rd 0, flush 0, corrupt 0, gen 0
> [  467.922777] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [fsync-err:5162]
> [  467.923930] Modules linked in: btrfs xor raid6_pq binfmt_misc ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_broute bridge stp llc ebtable_nat ip6table_mangle ip6table_security ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw iptable_mangle iptable_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core crct10dif_pclmul crc32_pclmul nfsd ghash_clmulni_intel ppdev snd_hwdep snd_pcm acpi_cpufreq snd_timer tpm_tis snd parport_pc tpm_tis_core parport pcspkr tpm i2c_piix4 auth_rpcgss soundcore floppy joydev qemu_fw_cfg virtio_balloon nfs_acl lockd grace sunrpc xfs libcrc32c qxl drm_kms_helper virtio_net
> [  467.932783]  virtio_blk virtio_console virtio_rng ttm drm crc32c_intel virtio_pci virtio_ring ata_generic virtio serio_raw pata_acpi
> [  467.934561] CPU: 0 PID: 5162 Comm: fsync-err Tainted: G        W       4.11.0+ #52
> [  467.935662] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
> [  467.936838] task: ffff8dd524511d40 task.stack: ffffa93242804000
> [  467.937724] RIP: 0010:__list_del_entry_valid+0x40/0xc0
> [  467.938518] RSP: 0018:ffffa93242807d28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
> [  467.939540] RAX: 0000000000000001 RBX: ffffa93242807e78 RCX: ffffa93242807e90
> [  467.940510] RDX: dead000000000200 RSI: ffffa93242807e90 RDI: ffffa93242807e90
> [  467.941487] RBP: ffffa93242807e20 R08: ffffa93242807e90 R09: 0000000000000000
> [  467.942457] R10: ffffa93242807c40 R11: 00000000000001b5 R12: 0000000000000000
> [  467.943424] R13: ffffa93242807e78 R14: ffff8dd531056b08 R15: ffffa93242807e90
> [  467.944391] FS:  00007f7984520800(0000) GS:ffff8dd53fc00000(0000) knlGS:0000000000000000
> [  467.945458] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  467.946271] CR2: 000055d401203460 CR3: 000000007134f000 CR4: 00000000000406f0
> [  467.947232] Call Trace:
> [  467.947738]  ? btrfs_sync_log+0x434/0xc00 [btrfs]
> [  467.948482]  btrfs_sync_file+0x3b8/0x400 [btrfs]
> [  467.949216]  ? btrfs_sync_file+0x3b8/0x400 [btrfs]
> [  467.949941]  vfs_fsync_range+0x49/0xb0
> [  467.950496]  do_fsync+0x3d/0x70
> [  467.950992]  SyS_fsync+0x10/0x20
> [  467.951496]  entry_SYSCALL_64_fastpath+0x1a/0xa9
> [  467.952121] RIP: 0033:0x7f7983af1b70
> [  467.952657] RSP: 002b:00007ffe13b3aa18 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
> [  467.953610] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f7983af1b70
> [  467.954592] RDX: 0000000000010401 RSI: 00000000023ef030 RDI: 0000000000000004
> [  467.955586] RBP: 00007ffe13b3cc77 R08: 0000000000000000 R09: 00007f79845364c8
> [  467.956640] R10: 0000000000000008 R11: 0000000000000246 R12: 00000000023ef030
> [  467.957649] R13: 0000000000000005 R14: 0000000000000000 R15: 0000000000000000
> [  467.958639] Code: 48 39 d0 74 27 48 ba 00 02 00 00 00 00 ad de 49 39 d0 74 3a 4d 8b 00 4c 39 c7 75 51 4c 8b 40 08 4c 39 c7 75 67 b8 01 00 00 00 5d <c3> 49 89 c0 48 89 f9 48 c7 c2 c8 ca c8 81 be 2f 00 00 00 48 c7 
> [  470.383161] BTRFS error (device dm-4): bdev /dev/mapper/error-test errs: wr 5, rd 0, flush 0, corrupt 0, gen 0
> [  470.384834] BTRFS error (device dm-4): bdev /dev/mapper/error-test errs: wr 6, rd 0, flush 0, corrupt 0, gen 0
> [  470.386505] BTRFS error (device dm-4): bdev /dev/mapper/error-test errs: wr 7, rd 0, flush 0, corrupt 0, gen 0
> [  470.388133] BTRFS error (device dm-4): bdev /dev/mapper/error-test errs: wr 8, rd 0, flush 0, corrupt 0, gen 0
> [  470.392267] BTRFS error (device dm-4): bdev /dev/mapper/error-test errs: wr 9, rd 0, flush 0, corrupt 0, gen 0
> [  470.393876] BTRFS error (device dm-4): bdev /dev/mapper/error-test errs: wr 10, rd 0, flush 0, corrupt 0, gen 0
> [  470.397574] BTRFS error (device dm-4): bdev /dev/mapper/error-test errs: wr 11, rd 0, flush 0, corrupt 0, gen 0
> [  470.399134] BTRFS error (device dm-4): bdev /dev/mapper/error-test errs: wr 12, rd 0, flush 0, corrupt 0, gen 0
> 
> 

(cc'ing Liu Bo since we were discussing this earlier this week)

I can't reproduce this on stock v4.11, so I think this is a bug in my
series.

I think this is due to the differences in how errors are being reported
from filemap_fdatawait_range now causing some transactions to end up
being freed while they're still on the log_ctxs list. I'm working on
hunting down the problem now.

Sorry for the noise!
-- 
Jeff Layton <jlayton@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux