On 2015-02-05 10:24, Juergen Fitschen wrote:
On 05 Feb 2015, at 13:47, Austin S Hemmelgarn <ahferroin7@xxxxxxxxx> wrote:
I've actually seen similar behavior without the virtualization when doing large filesystem intensive operations with compression enabled.
I don't know if this is significant, but it seems to be worse with lzo compression than zlib, and also seems to be worse when compression is enabled at the filesystem level instead of through 'chattr +c’.
Zlib isn’t that performant compared to lzo. So zlib creates a bottleneck at the CPU and thereby limits the IO the volume is exposed to. So our problem might be related to intensive operations on the volume.
I'm not certain, but I think it might have something to do with the somewhat brain-dead default parameters in the default I/O scheduler (the so-called 'completely fair queue', which as I've said before was obviously named by a mathematician and not based on it's actual behavior), although it seems to be much worse when using the Deadline and no-op I/O schedulers.
Good idea. I had a look to my configuration of the “stack” for the block devices and their queuing and caching. My setup looks like this (with default settings - I made no adjustments):
* 2 HDDs
* Hewlett-Packard Company Smart Array Gen8 Controllers (rev 01)
[With 1GB write cache. Other black magic seems to be included. Combines both HDDs to a RAID1]
* Block device driver
* IO Scheduler: deadline
* LVM
* QEMU
[With writeback cache. Should I change it to “none"? The storage controller has write cache included.]
* virtio-blk
* btrfs
As you can see, only one IO scheduler is involved. The VM by default seems not to use any IO schedulers. I checked this by executing “cat /sys/block/vd*/queue/scheduler” on the VM and it reported “none”.
Yeah, thankfully Linux is smart enough to turn off the I/O scheduler for
block devices that it can see are virtualized.
At the very least, I would suggest changing QEMU to not use caching.
I've found that host-side caching for virtualized block devices tends to
just make things slower unless the block device is imported over the
netowkr (ie, iSCSI/ATAoE/NBD). This is especially significant when you
have a storage controller with such a big write-cache (I would make sure
that the write-cache on the storage controller is non-volatile first
though, if it isn't you should probably use writethrough mode for QEMU's
caching).
Additionally, you might want to try using CFQ for the I/O scheduler on
the host side, albeit with some non-default parameters (the deadline
scheduler tends to get very laggy with really heavy random-access
workloads). I've found that it does do well when you actually take the
time to fine tune things. The particular parameters I would suggest
some experimentation with for CFQ are:
* Under /sys/block/<device>/queue: nomerges, rq_affinity, max_sectors_kb
* Under /sys/block/<device>/queue/iosched: group_idle, quantum,
slice_idle, back_seek_max, back_seek_penalty
There is good information on what each of these does in the kernel
sources under Documentation/block/queue-sysfs.txt and
Documentation/block/cfq-iosched.txt
Once you find a set of parameters that work well, I'd suggest writing
some simple udev rules to automatically set them on boot/device enumeration.
FWIW, I've found that the following parameters provide near optimal
performance for the SSD in my laptop:
queue/nomerges=1
queue/rq_affinity=2
queue/max_sectors_kb=16387 (16MB, which is 4x the erase-block size on
the SSD)
queue/iosched/group_idle=8
queue/iosched/quantum=128 (this also happens to be equal to the device's
NCQ queue depth)
queue/iosched/slice_idle=0
Using these settings, the time from the boot-loader handing off
execution to the kernel to having a login prompt is about 45 seconds.
With the default CFQ parameters, it takes almost 150 seconds, so fine
tuning here can provide a very noticeable performance improvement.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html