Issue on BTRFS/copy of really huge files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

I found an issue since kernel 4.1.17.x.
My setup for dealing with virtual machines is:
btrfs SSD/raid1 	[sda, sdb] ca. 250 GiB as OS Root [Arch Linux]
btrfs Rotational/Raid5 	[sdc, sdd, sde] ca. 12 GiB as virt. Machie store

My Work flow:
If a very I/O intensive operation on a virtual machine is to be done I
move this onto a subvol on the rootfs wich has a fast raid1 sdd setup.

Working on the ssd and rotational btrfs stack is fine and performant
(nowcow used).

But now the little issue.

Moving a virtual machine from ssd/raid1 subvolume (nocow) into the
rotational big store (noocow) fails.
After filling up the cachememory (ram) the data flow cuts down to zero
0 kb/sec.
In fatal result the copy of an huge file hangs does not proceed any
more, load raises infinite, iops falling to zero. In kernel log I find:

[  491.151952] INFO: task kworker/u16:28:1027 blocked for more than 120
seconds.
[  491.151953]       Tainted: P           O      4.17.3-1-ARCH #1
[  491.151953] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  491.151953] kworker/u16:28  D    0  1027      2 0x80000000
[  491.151965] Workqueue: btrfs-endio-raid56 btrfs_endio_raid56_helper
[btrfs]
[  491.151965] Call Trace:
[  491.151967]  ? __schedule+0x282/0x890
[  491.151969]  schedule+0x32/0x90
[  491.151970]  io_schedule+0x12/0x40
[  491.151971]  blk_mq_get_tag+0x146/0x2a0
[  491.151972]  ? wait_woken+0x80/0x80
[  491.151974]  blk_mq_get_request+0x342/0x420
[  491.151975]  blk_mq_make_request+0x121/0x670
[  491.151976]  generic_make_request+0x187/0x370
[  491.151977]  submit_bio+0x45/0x140
[  491.151988]  ? rbio_orig_end_io+0xd0/0xd0 [btrfs]
[  491.151998]  finish_rmw+0x392/0x530 [btrfs]
[  491.152009]  normal_work_helper+0xbd/0x350 [btrfs]
[  491.152010]  process_one_work+0x1d1/0x3b0
[  491.152011]  worker_thread+0x2b/0x3d0
[  491.152012]  ? process_one_work+0x3b0/0x3b0
[  491.152014]  kthread+0x112/0x130
[  491.152015]  ? kthread_flush_work_fn+0x10/0x10
[  491.152016]  ret_from_fork+0x35/0x40
[root@pc6 ~]#

Scheduler was on ssd disc deadline, on rotational bfq.

After this "copy data flow Zero Rate" event copying to the big raid
fails, sync never ccomes back. The machine load raises up to 50 .. 60 ..
80 .. (seen in top).
I/O Load seen in iotop is zero.

Rebooting via SysReq Key works, system comes back and is working fine.

After I switched the scheduler to "kyber" on all discs (including the
rotational discs) the issue does not occour. ?!?

with kind regards
Jürgen Sauer
-- 
Jürgen Sauer - automatiX GmbH,
+49-4209-4699, juergen.sauer@xxxxxxxxxxxx
Geschäftsführer: Jürgen Sauer,
Gerichtstand: Amtsgericht Walsrode • HRB 120986
Ust-Id: DE191468481 • St.Nr.: 36/211/08000
GPG Public Key zur Signaturprüfung:
http://www.automatix.de/juergen_sauer_publickey.gpg

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux