Re: Software RAID checksum performance on 24 disks not even close to kernel reported
|[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Not to interject too much here ... On 06/07/2012 12:06 AM, Stan Hoeppner wrote:
On 6/6/2012 11:09 AM, Dan Williams wrote:Hardware raid ultimately does the same shuffling, outside of nvram an advantage it has is that parity data does not traverse the bus...Are you referring to the host data bus(s)? I.e. HT/QPI and PCIe? With a 24 disk array, a full stripe write is only 1/12th parity data, less than 10%. And the buses (point to point actually) of 24 drive caliber systems will usually start at one way B/W of 4GB/s for PCIe 2.0 x8 and with one way B/W from the PCIe controller to the CPU starting at
PCIe gen 2 is ~500MB/s per lane in each direction, but there's like a 14% protocol overhead, so your "sustained" streaming performance is more along the lines of 430 MB/s. For a PCIe x8 gen 2 system, this nets you about 3.4GB/s in each direction.
10.4GB/s for AMD HT 3.0 systems. PCIe x8 is plenty to handle a 24 drive md RAID 6, using 7.2K SATA drives anyway.
Each drive capable of streaming say 140 MB/s (modern drives). 24 x 140 = 3.4 GB/s
This assumes streaming, no seeks that aren't part of streaming.This said, this is *not* a design pattern you'd want to follow for a number of reasons.
But for seek heavy designs, you aren't going to hit anything close to 140 MB/s. We've just done a brief study for a customer on what they should expect to see (by measuring it and reporting on the measurement). Assume close to an order of magnitude off for seekier loads.
Also, please note that iozone, dd, bonnie++, ... aren't great load generators, especially if things are in cache. You tend to measure the upper layers of the file system stack, and not the actual full stack performance. fio does a better job if you set the right options. This said, almost all of these codes suffer from a measurement at the front end of the stack, if you want to know what the disks are really doing, you have to start poking your head into the kernel proc/sys spaces. Whats interesting is that of the tools mentioned, only fio appears to eventually converge its reporting to what the backend hardware does. The front end measurements seem to do a pretty bad job of deciding when an IO begins and when it is complete. Could be an fsync or similar problem (discussed in the past), but its very annoying. End users look at bonnie++ and other results and don't understand why their use case is so badly different in performance.
What is a bigger issue, and may actually be what you were referring to, is read-modify-write B/W, which will incur a full stripe read and write. For RMW heavy workloads, this is significant. HBA RAID does have a big advantage here, compared to one's md array possessing the aggregate performance to saturate the PCIe bus.
The big issues for most HBAs are the available bandwidth to the disks, the quality/implementation of the controllers/drivers, etc. Hanging 24 drives off a single controller is a low cost design, not a high performance design. You will get contention (especially with expandor chips). You will get sub-optimal performance.
Checksumming speed on the CPU will not be the bottleneck in most of these cases. Controller/driver performance and contention will be.
Back to your regularly scheduled thread ... -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman@xxxxxxxxxxxxxxxxxxxxxxx web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html