Re: Low IOOP Performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2017-02-27 14:15, John Marrett wrote:
Liubo correctly identified direct IO as a solution for my test
performance issues, with it in use I achieved 908 read and 305 write,
not quite as fast as ZFS but more than adequate for my needs. I then
applied Peter's recommendation of switching to raid10 and tripled
performance again up to 3000 read and 1000 write IOOPs.

I do not understand why the page cache had such a large negative
impact on performance, it seems like it should have no impact or help
slightly with caching rather than severely impact both read and write
IO. Is this expected behaviour and what is the real world impact on
applications that don't use direct IO?
Generally yes, it is expected behavior, but it's not really all that high impact for most things that don't use direct IO, since most things that don't either:
1. Care more about bulk streaming throughput than IOPS.
2. Aren't performance sensitive enough for it to matter.
3. Actually need the page cache for performance reasons (the Linux page cache actually does pretty well for read-heavy workloads with consistent access patterns).

If you look, you should actually see lower bulk streaming throughput with direct IO than without on most devices, especially when dealing with ATA or USB disks, since the page cache functionally reduces the number of IO requests that get sent to a device even if it's all new data. The read-ahead it does to achieve this though only works for purely or mostly sequential workloads, so it ends up being detrimental to random access or very sparsely sequential workloads, which in turn are what usually care about IOPS over streaming throughput.

With regards to RAID10 my understanding is that I can't mix drive
sizes and use their full capacity on a RAID10 volume. My current
server runs a mixture of drive sizes and I am likely to need to do so
again in the future. Can I do this and still enjoy the performance
benefits of RAID 10?
In theory, you should be fine. BTRFS will use narrower stripes when it has to, as long as it has at least 4 disks to put data on. If you can make sure you have even numbers of drives of each size (and ideally an even total number of drives), you should get close to full utilization. Keep in mind though that as the FS gets more and more full (and the stripes therefore get narrower), you'll start to see odd, seemingly arbitrary performance differences based on what your accessing.

That said, if you can manage to just use an even number of identically sized disks, you can get even more performance by running BTRFS in raid1 mode on top of two LVM or MD RAID0 volumes. That will give you the same data safety as BTRFS raid10 mode, but depending on the work load can increase performance pretty significantly (I see about a 10-20% difference currently, but I don't have any particularly write intensive workloads). Note that doing so will improve sequential access performance more than random access, so it may not be worth the effort in your case.

a ten disk raid1 using 7.2k 3 TB SAS drives

Those are really low IOPS-per-TB devices, but good choice for
SAS, as they will have SCT/ERC.


I don't expect the best IOOP performance from them, they are intended
for bulk data storage, however the results I had previously didn't
seem acceptable or normal.


I strongly suspect that we have a different notion of "IOPS",
perhaps either logical vs. physical IOPS, or randomish vs.
sequentialish IOPS. I'll have a look at your attachments in more
detail.


I did not achieve 650 MB/s with random IO nor do I expect to, it was a
sequential write of 250 GB performed using dd with the conv=fsync
option to ensure that all writes were complete before reporting write
speed.


I created a zfs filesystem for comparison on another
checksumming filesystem using the same layout and measured
IOOP rates at 4315 read, 1449 write with sync enabled (without
sync it's clearly just writing to RAM), sequential performance
was comparable to btrfs.

It seems unlikely to me that you got that with a 10-device
mirror 'vdev', most likely you configured it as a stripe of 5x
2-device mirror vdevs, that is RAID10.


This is correct, it was a RAID10 across 5 mirrored volumes.

Thank you both very much for your help with my testing,

-JohnF

RAID1 Direct IO Test Results

johnf@altered-carbon:/btrfs/johnf$ fio --randrepeat=1
--ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k
--iodepth=64 --size=1G --readwrite=randrw --rwmixread=75 --direct=1
test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.2.10
Starting 1 process
test: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [m(1)] [100.0% done] [8336KB/2732KB/0KB /s] [2084/683/0
iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=12270: Mon Feb 27 11:49:04 2017
  read : io=784996KB, bw=3634.6KB/s, iops=908, runt=215981msec
  write: io=263580KB, bw=1220.4KB/s, iops=305, runt=215981msec
  cpu          : usr=1.50%, sys=8.18%, ctx=244134, majf=0, minf=116
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=196249/w=65895/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: io=784996KB, aggrb=3634KB/s, minb=3634KB/s, maxb=3634KB/s,
mint=215981msec, maxt=215981msec
  WRITE: io=263580KB, aggrb=1220KB/s, minb=1220KB/s, maxb=1220KB/s,
mint=215981msec, maxt=215981msec


RAID10 Direct IO Test Results

johnf@altered-carbon:/btrfs/johnf$ fio --randrepeat=1
--ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k
--iodepth=64 --size=1G --readwrite=randrw --rwmixread=75 --ba=4k
--direct=1
test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.2.10
Starting 1 process
test: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [m(1)] [100.0% done] [16136KB/5312KB/0KB /s]
[4034/1328/0 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=12644: Mon Feb 27 13:50:35 2017
  read : io=784996KB, bw=12003KB/s, iops=3000, runt= 65401msec
  write: io=263580KB, bw=4030.3KB/s, iops=1007, runt= 65401msec
  cpu          : usr=3.66%, sys=19.54%, ctx=188302, majf=0, minf=22
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=196249/w=65895/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: io=784996KB, aggrb=12002KB/s, minb=12002KB/s, maxb=12002KB/s,
mint=65401msec, maxt=65401msec
  WRITE: io=263580KB, aggrb=4030KB/s, minb=4030KB/s, maxb=4030KB/s,
mint=65401msec, maxt=65401msec
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux