On 07/06/18 10:37, Juergen Sauer wrote: ..
Moving a virtual machine from ssd/raid1 subvolume (nocow) into the rotational big store (noocow) fails. After filling up the cachememory (ram) the data flow cuts down to zero 0 kb/sec. In fatal result the copy of an huge file hangs does not proceed any more, load raises infinite, iops falling to zero. In kernel log I find: [ 491.151952] INFO: task kworker/u16:28:1027 blocked for more than 120 seconds. [ 491.151953] Tainted: P O 4.17.3-1-ARCH #1 [ 491.151953] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 491.151953] kworker/u16:28 D 0 1027 2 0x80000000 [ 491.151965] Workqueue: btrfs-endio-raid56 btrfs_endio_raid56_helper [btrfs] [ 491.151965] Call Trace: [ 491.151967] ? __schedule+0x282/0x890 [ 491.151969] schedule+0x32/0x90 [ 491.151970] io_schedule+0x12/0x40 [ 491.151971] blk_mq_get_tag+0x146/0x2a0
This has nothing to do with btrfs and is simply one of the remaining (but already fixed upstream) bugs in the blk-mq stack, probably related to sbitmap concurrency and or "tag starvation". I could give you a list of patches from 4.18+ that help (reliably) but I suppose you're not into kernel patching, so the easiest way for you would be to to switch to the old block layer (e.g. by booting with kernel flag scsi_mod.use_blk_mq=0) and use deadline/cfq as before. This should all be fixed & work reliable with 4.18+; it looks that by 4.19 blk-mq will also be enabled by default. cheers Holger -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
