Re: Regression: kernel 4.0.0-rc1 - soft lockups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 03, 2015 at 08:31:10AM +0100, Marcel Ritter wrote:
> Hi,
> 
> yes it is reproducible.
> 
> Just creating a new btrfs filesystem (14 disks, data/mdata raid6,
> latest git btrfs-progs)
> and mounting this filesystems causes the system to hang (I think I once even got
> it mounted, but it did hang shortly after when dd started writing to it).
> 
> I just ran some quick tests and (at least at first sight) it looks
> like the raid5/6
> code may cause the trouble:
> 
> I created different btrfs filesystem types, mounted them and (if possible)
> did a big "dd" on the filesystem:
> 
> mkfs.btrfs /dev/cciss/c1d* -m raid0 -d raid0 -f -> no problem (only short test)
> mkfs.btrfs /dev/cciss/c1d* -m raid1 -d raid1 -f -> no problem (only short test)
> mkfs.btrfs /dev/cciss/c1d* -m raid5 -d raid5 -f -> (almost) instant hang
> mkfs.btrfs /dev/cciss/c1d* -m raid6 -d raid6 -f -> (almost) instant
> hang (standard test)
> 
> Once the machine is up again I'll do some more testing (variing the combination
> of data and mdata raid levels)

Hmm, just FYI, raid5&6 works good on my box with 4.0.0 rc1.

Thanks,

-liubo

> 
> Bye,
>    Marcel
> 
> 
> 2015-03-03 7:37 GMT+01:00 Liu Bo <bo.li.liu@xxxxxxxxxx>:
> > On Tue, Mar 03, 2015 at 07:02:01AM +0100, Marcel Ritter wrote:
> >> Hi,
> >>
> >> yesterday I did a kernel update on my btrfs test system (Ubuntu
> >> 14.04.2) from custom-build kernel 3.19-rc6 to 4.0.0-rc1.
> >>
> >> Almost instantly after starting my test script, the system got stuck
> >> with soft lockups (the machine was running the very same test for
> >> weeks on the old kernel without problems,
> >> basically doing massive streaming i/o on a raid6 btrfs volume):
> >>
> >> I found 2 types of messages in the logs:
> >>
> >> one btrfs related:
> >>
> >> [34165.540004] INFO: rcu_sched detected stalls on CPUs/tasks: { 3 7}
> >> (detected by 6, t=6990777 jiffies, g=67455, c=67454, q=0)
> >> [34165.540004] Task dump for CPU 3:
> >> [34165.540004] mount           D ffff8803ed266000     0 15156  15110 0x00000000
> >> [34165.540004]  0000000000000158 0000000000000014 ffff8803ecc13718
> >> ffff8803ecc136d8
> >> [34165.540004]  ffffffff8106075a 0000000000000000 0000000000000002
> >> 0000000000000000
> >> [34165.540004]  00000000ecc13728 ffff8803eb603128 0000000000000000
> >> 0000000000000000
> >> [34165.540004] Call Trace:
> >> [34165.540004]  [<ffffffff8106075a>] ? __do_page_fault+0x2fa/0x440
> >> [34165.540004]  [<ffffffff810608d1>] ? do_page_fault+0x31/0x70
> >> [34165.540004]  [<ffffffff81792778>] ? page_fault+0x28/0x30
> >> [34165.540004]  [<ffffffff810ae2ce>] ? pick_next_task_fair+0x53e/0x880
> >> [34165.540004]  [<ffffffff810ae2ce>] ? pick_next_task_fair+0x53e/0x880
> >> [34165.540004]  [<ffffffff8109707c>] ? dequeue_task+0x5c/0x80
> >> [34165.540004]  [<ffffffff8178b9a3>] ? __schedule+0xf3/0x960
> >> [34165.540004]  [<ffffffff8178c247>] ? schedule+0x37/0x90
> >> [34165.540004]  [<ffffffffa0896375>] ?
> >> btrfs_start_ordered_extent+0xd5/0x110 [btrfs]
> >> [34165.540004]  [<ffffffff810b3cb0>] ? prepare_to_wait_event+0x110/0x110
> >> [34165.540004]  [<ffffffffa0896884>] ?
> >> btrfs_wait_ordered_range+0xc4/0x120 [btrfs]
> >> [34165.540004]  [<ffffffffa08c0c18>] ?
> >> __btrfs_write_out_cache+0x378/0x470 [btrfs]
> >> [34165.540004]  [<ffffffffa08c104a>] ? btrfs_write_out_cache+0x9a/0x100 [btrfs]
> >> [34165.540004]  [<ffffffffa086af79>] ?
> >> btrfs_write_dirty_block_groups+0x159/0x560 [btrfs]
> >> [34165.540004]  [<ffffffffa08f2aa6>] ? commit_cowonly_roots+0x18d/0x2a4 [btrfs]
> >> [34165.540004]  [<ffffffffa087bd31>] ?
> >> btrfs_commit_transaction+0x521/0xa50 [btrfs]
> >> [34165.540004]  [<ffffffffa08a3fbe>] ? btrfs_create_uuid_tree+0x5e/0x110 [btrfs]
> >> [34165.540004]  [<ffffffffa087963f>] ? open_ctree+0x1dff/0x2200 [btrfs]
> >> [34165.540004]  [<ffffffffa084f7ce>] ? btrfs_mount+0x75e/0x8f0 [btrfs]
> >> [34165.540004]  [<ffffffff811ecbf9>] ? mount_fs+0x39/0x180
> >> [34165.540004]  [<ffffffff81192405>] ? __alloc_percpu+0x15/0x20
> >> [34165.540004]  [<ffffffff812082bb>] ? vfs_kern_mount+0x6b/0x120
> >> [34165.540004]  [<ffffffff8120afe4>] ? do_mount+0x204/0xb30
> >> [34165.540004]  [<ffffffff8120bc0b>] ? SyS_mount+0x8b/0xe0
> >> [34165.540004]  [<ffffffff817905ed>] ? system_call_fastpath+0x16/0x1b
> >> [34165.540004] Task dump for CPU 7:
> >> [34165.540004] kworker/u16:1   R  running task        0 14518      2 0x00000008
> >> [34165.540004] Workqueue: btrfs-freespace-write
> >> btrfs_freespace_write_helper [btrfs]
> >> [34165.540004]  0000000000000200 ffff8803eac6fdf8 ffffffffa08ac242
> >> ffff8803eac6fe48
> >> [34165.540004]  ffffffff8108b64f 00000000f1091400 0000000000000000
> >> ffff8803eca58000
> >> [34165.540004]  ffff8803ea9ed3c0 ffff8803f1091418 ffff8803f1091400
> >> ffff8803eca58000
> >> [34165.540004] Call Trace:
> >> [34165.540004]  [<ffffffffa08ac242>] ?
> >> btrfs_freespace_write_helper+0x12/0x20 [btrfs]
> >> [34165.540004]  [<ffffffff8108b64f>] ? process_one_work+0x14f/0x420
> >> [34165.540004]  [<ffffffff8108be08>] ? worker_thread+0x118/0x510
> >> [34165.540004]  [<ffffffff8108bcf0>] ? rescuer_thread+0x3d0/0x3d0
> >> [34165.540004]  [<ffffffff81091212>] ? kthread+0xd2/0xf0
> >> [34165.540004]  [<ffffffff81091140>] ? kthread_create_on_node+0x180/0x180
> >> [34165.540004]  [<ffffffff8179053c>] ? ret_from_fork+0x7c/0xb0
> >> [34165.540004]  [<ffffffff81091140>] ? kthread_create_on_node+0x180/0x180
> >>
> >>
> >> and one general (related to "native_flush_tlb_other":
> >>
> >> [34152.604004] NMI watchdog: BUG: soft lockup - CPU#6 stuck for 23s!
> >> [rs:main Q:Reg:490]
> >> [34152.604004] Modules linked in: btrfs(E) xor(E) radeon(E) ttm(E)
> >> drm_kms_helper(E) kvm(E) drm(E) raid6_pq(E) i2c_algo_bit(E) ipmi_si
> >> (E) amd64_edac_mod(E) serio_raw(E) hpilo(E) hpwdt(E) edac_core(E)
> >> shpchp(E) k8temp(E) mac_hid(E) edac_mce_amd(E) nfsd(E) auth_rpcgss(E
> >> ) nfs_acl(E) nfs(E) lockd(E) grace(E) sunrpc(E) fscache(E) lp(E)
> >> parport(E) hpsa(E) pata_acpi(E) hid_generic(E) psmouse(E) usbhid(E) b
> >> nx2(E) cciss(E) hid(E) pata_amd(E)
> >> [34152.604004] CPU: 6 PID: 490 Comm: rs:main Q:Reg Tainted: G      D W
> >>   EL  4.0.0-rc1-custom #1
> >> [34152.604004] Hardware name: HP ProLiant DL585 G2   , BIOS A07 05/02/2011
> >> [34152.604004] task: ffff8803eecd9910 ti: ffff8803ecb30000 task.ti:
> >> ffff8803ecb30000
> >> [34152.604004] RIP: 0010:[<ffffffff810f1e3a>]  [<ffffffff810f1e3a>]
> >> smp_call_function_many+0x20a/0x270
> >> [34152.604004] RSP: 0018:ffff8803ecb33cf8  EFLAGS: 00000202
> >> [34152.604004] RAX: 0000000000000000 RBX: ffffffff81cdd140 RCX: ffff8803ffc19700
> >> [34152.604004] RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000000
> >> [34152.604004] RBP: ffff8803ecb33d38 R08: ffff8803ffd961c8 R09: 0000000000000004
> >> [34152.604004] R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000000000
> >> [34152.604004] R13: ffff880300000040 R14: ffff8803ecb33ca0 R15: ffff8803ecb33ca8
> >> [34152.604004] FS:  00007f9cf6dae700(0000) GS:ffff8803ffd80000(0000)
> >> knlGS:0000000000000000
> >> [34152.672920] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [34152.672920] CR2: 00007f9ce80091a8 CR3: 00000003e50fe000 CR4: 00000000000007e0
> >> [34152.672920] Stack:
> >> [34152.672920]  00000001ecb33de8 0000000000016180 00007f9cf0024fff
> >> ffff8803eb726900
> >> [34152.672920]  ffff8803eb726bd0 00007f9cf0025000 00007f9cf0021000
> >> 0000000000000004
> >> [34152.672920]  ffff8803ecb33d68 ffffffff8106722e ffff8803ecb33d68
> >> ffff8803eb726900
> >> [34152.672920] Call Trace:
> >> [34152.672920]  [<ffffffff8106722e>] native_flush_tlb_others+0x2e/0x30
> >> [34152.672920]  [<ffffffff81067354>] flush_tlb_mm_range+0x64/0x170
> >> [34152.672920]  [<ffffffff8119e66e>] tlb_flush_mmu_tlbonly+0x7e/0xe0
> >> [34152.672920]  [<ffffffff8119eed4>] tlb_finish_mmu+0x14/0x50
> >> [34152.672920]  [<ffffffff811a0cea>] zap_page_range+0xca/0x100
> >> [34152.672920]  [<ffffffff811b3993>] SyS_madvise+0x363/0x790
> >> [34152.672920]  [<ffffffff817905ed>] system_call_fastpath+0x16/0x1b
> >> [34152.672920] Code: 9d 5c 2b 00 3b 05 7b b2 c2 00 89 c2 0f 8d 83 fe
> >> ff ff 48 98 49 8b 4d 00 48 03 0c c5 40 b1 d1 81 f6 41 18 01 74 cb
> >>  0f 1f 00 f3 90 <f6> 41 18 01 75 f8 eb be 0f b6 4d c4 4c 89 fa 4c 89 f6 44 89 ef
> >>
> >> So I'm not totally sure if this is a btrfs problem, or something else
> >> got broken in 4.0.0-rc1.
> >>
> >> Maybe someone can have a look.
> >>
> >> If you need more information just let me know.
> >
> > Is it reproducible?
> >
> > From the stacks about btrfs, it's been stopped at mount stage, so it's likely to
> > be unrelated to btrfs.
> >
> > Thanks,
> >
> > -liubo
> >
> >>
> >> Bye,
> >>    Marcel
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux