Re: slow single -> raid1 conversion (heavy write to original LVM volume)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 14, 2020 at 10:41 AM jakub nantl <jn@xxxxxxxxxx> wrote:
>
> hello,
>
> thank for reply, here is the call trace, no need to reboot it, so I am
> waiting :)
>
> [538847.101197] sysrq: Show Blocked State
> [538847.101206]   task                        PC stack   pid father
> [538847.101321] btrfs           D    0 16014      1 0x00004004
> [538847.101324] Call Trace:
> [538847.101335]  __schedule+0x2e3/0x740
> [538847.101339]  ? __switch_to_asm+0x40/0x70
> [538847.101342]  ? __switch_to_asm+0x34/0x70
> [538847.101345]  schedule+0x42/0xb0
> [538847.101348]  schedule_timeout+0x203/0x2f0
> [538847.101351]  ? __schedule+0x2eb/0x740
> [538847.101355]  io_schedule_timeout+0x1e/0x50
> [538847.101358]  wait_for_completion_io+0xb1/0x120
> [538847.101363]  ? wake_up_q+0x70/0x70
> [538847.101401]  write_all_supers+0x896/0x960 [btrfs]
> [538847.101426]  btrfs_commit_transaction+0x6ea/0x960 [btrfs]
> [538847.101456]  prepare_to_merge+0x210/0x250 [btrfs]
> [538847.101484]  relocate_block_group+0x36b/0x5f0 [btrfs]
> [538847.101512]  btrfs_relocate_block_group+0x15e/0x300 [btrfs]
> [538847.101539]  btrfs_relocate_chunk+0x2a/0x90 [btrfs]
> [538847.101566]  __btrfs_balance+0x409/0xa50 [btrfs]
> [538847.101593]  btrfs_balance+0x3ae/0x530 [btrfs]
> [538847.101621]  btrfs_ioctl_balance+0x2c1/0x380 [btrfs]
> [538847.101648]  btrfs_ioctl+0x836/0x20d0 [btrfs]
> [538847.101652]  ? do_anonymous_page+0x2e6/0x650
> [538847.101656]  ? __handle_mm_fault+0x760/0x7a0
> [538847.101662]  do_vfs_ioctl+0x407/0x670
> [538847.101664]  ? do_vfs_ioctl+0x407/0x670
> [538847.101669]  ? do_user_addr_fault+0x216/0x450
> [538847.101672]  ksys_ioctl+0x67/0x90
> [538847.101675]  __x64_sys_ioctl+0x1a/0x20
> [538847.101680]  do_syscall_64+0x57/0x190
> [538847.101683]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [538847.101687] RIP: 0033:0x7f3cb04c85d7
> [538847.101695] Code: Bad RIP value.
> [538847.101697] RSP: 002b:00007ffcd4e5fe88 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [538847.101701] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> 00007f3cb04c85d7
> [538847.101704] RDX: 00007ffcd4e5ff18 RSI: 00000000c4009420 RDI:
> 0000000000000003
> [538847.101707] RBP: 00007ffcd4e5ff18 R08: 0000000000000078 R09:
> 0000000000000000
> [538847.101710] R10: 0000559f27675010 R11: 0000000000000246 R12:
> 0000000000000003
> [538847.101713] R13: 00007ffcd4e62734 R14: 0000000000000001 R15:
> 0000000000000000
> [538847.101718] btrfs           D    0 30196      1 0x00000004
> [538847.101720] Call Trace:
> [538847.101724]  __schedule+0x2e3/0x740
> [538847.101727]  schedule+0x42/0xb0
> [538847.101753]  btrfs_cancel_balance+0xf8/0x170 [btrfs]
> [538847.101759]  ? wait_woken+0x80/0x80
> [538847.101786]  btrfs_ioctl+0x13af/0x20d0 [btrfs]
> [538847.101789]  ? do_anonymous_page+0x2e6/0x650
> [538847.101793]  ? __handle_mm_fault+0x760/0x7a0
> [538847.101797]  do_vfs_ioctl+0x407/0x670
> [538847.101800]  ? do_vfs_ioctl+0x407/0x670
> [538847.101803]  ? do_user_addr_fault+0x216/0x450
> [538847.101806]  ksys_ioctl+0x67/0x90
> [538847.101809]  __x64_sys_ioctl+0x1a/0x20
> [538847.101813]  do_syscall_64+0x57/0x190
> [538847.101856]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [538847.101859] RIP: 0033:0x7fa33680c5d7
> [538847.101864] Code: Bad RIP value.
> [538847.101873] RSP: 002b:00007ffdbe2b9c58 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [538847.101888] RAX: ffffffffffffffda RBX: 0000000000000003 RCX:
> 00007fa33680c5d7
> [538847.101897] RDX: 0000000000000002 RSI: 0000000040049421 RDI:
> 0000000000000003
> [538847.101908] RBP: 00007ffdbe2ba1d8 R08: 0000000000000078 R09:
> 0000000000000000
> [538847.101918] R10: 00005604500f4010 R11: 0000000000000246 R12:
> 00007ffdbe2ba735
> [538847.101928] R13: 00007ffdbe2ba1c0 R14: 0000000000000000 R15:
> 0000000000000000
>

I think it got clipped. And also the MUA is wrapping it and making it
hard to read. I suggest 'journalctl -k -o short-monotonic' because
what started the problem might actually be much earlier and there's no
way to know that without the entire thing. Put that up in a dropbox or
pastebin or google drive or equivalent. And hopefully a dev will be
able to figure out why it's hung up. All I can tell from the above is
that it's hung up on cancelling, which doesn't say much.

_handle_mm_fault is suspicious. On second thought, I suggest doing
sysrq+t. And then output journalctl -k, and post that. It'll have the
complete dmesg, the sysrq+w, and +t. That for sure won't post to the
list, it'll be too long, and the way MUA's wrap it, it's hard to read.

-- 
Chris Murphy



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux