Re: Deadlock on 3.18.5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2015-02-05 10:24, Juergen Fitschen wrote:

On 05 Feb 2015, at 13:47, Austin S Hemmelgarn <ahferroin7@xxxxxxxxx> wrote:

I've actually seen similar behavior without the virtualization when doing large filesystem intensive operations with compression enabled.
I don't know if this is significant, but it seems to be worse with lzo compression than zlib, and also seems to be worse when compression is enabled at the filesystem level instead of through 'chattr +c’.
Zlib isn’t that performant compared to lzo. So zlib creates a bottleneck at the CPU and thereby limits the IO the volume is exposed to. So our problem might be related to intensive operations on the volume.

I'm not certain, but I think it might have something to do with the somewhat brain-dead default parameters in the default I/O scheduler (the so-called 'completely fair queue', which as I've said before was obviously named by a mathematician and not based on it's actual behavior), although it seems to be much worse when using the Deadline and no-op I/O schedulers.
Good idea. I had a look to my configuration of the “stack” for the block devices and their queuing and caching. My setup looks like this (with default settings - I made no adjustments):

* 2 HDDs
* Hewlett-Packard Company Smart Array Gen8 Controllers (rev 01)
   [With 1GB write cache. Other black magic seems to be included. Combines both HDDs to a RAID1]
* Block device driver
* IO Scheduler: deadline
* LVM
* QEMU
   [With writeback cache. Should I change it to “none"? The storage controller has write cache included.]
* virtio-blk
* btrfs

As you can see, only one IO scheduler is involved. The VM by default seems not to use any IO schedulers. I checked this by executing “cat /sys/block/vd*/queue/scheduler” on the VM and it reported “none”.
Yeah, thankfully Linux is smart enough to turn off the I/O scheduler for block devices that it can see are virtualized.

At the very least, I would suggest changing QEMU to not use caching. I've found that host-side caching for virtualized block devices tends to just make things slower unless the block device is imported over the netowkr (ie, iSCSI/ATAoE/NBD). This is especially significant when you have a storage controller with such a big write-cache (I would make sure that the write-cache on the storage controller is non-volatile first though, if it isn't you should probably use writethrough mode for QEMU's caching).

Additionally, you might want to try using CFQ for the I/O scheduler on the host side, albeit with some non-default parameters (the deadline scheduler tends to get very laggy with really heavy random-access workloads). I've found that it does do well when you actually take the time to fine tune things. The particular parameters I would suggest some experimentation with for CFQ are:
 * Under /sys/block/<device>/queue: nomerges, rq_affinity, max_sectors_kb
* Under /sys/block/<device>/queue/iosched: group_idle, quantum, slice_idle, back_seek_max, back_seek_penalty There is good information on what each of these does in the kernel sources under Documentation/block/queue-sysfs.txt and Documentation/block/cfq-iosched.txt Once you find a set of parameters that work well, I'd suggest writing some simple udev rules to automatically set them on boot/device enumeration.

FWIW, I've found that the following parameters provide near optimal performance for the SSD in my laptop:
queue/nomerges=1
queue/rq_affinity=2
queue/max_sectors_kb=16387 (16MB, which is 4x the erase-block size on the SSD)
queue/iosched/group_idle=8
queue/iosched/quantum=128 (this also happens to be equal to the device's NCQ queue depth)
queue/iosched/slice_idle=0
Using these settings, the time from the boot-loader handing off execution to the kernel to having a login prompt is about 45 seconds. With the default CFQ parameters, it takes almost 150 seconds, so fine tuning here can provide a very noticeable performance improvement.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux