Re: builder io isue
|[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]|
On 12/26/2011 06:43 AM, Brendan Conoboy wrote:
On 12/25/2011 09:06 PM, Gordan Bobic wrote:Why not just mount direct via NFS? It'd be a lot quicker, not to mention easier to tune. It'd work for building all but a handful of packages (e.g. zsh), but you could handle that by having a single builder that uses a normal fs that has a policy pointing the packages that fail self-tests on NFS at it.I'm not acquainted with the rationale for the decision so perhaps somebody else can comment. Beyond the packages that demand a local filesystem, perhaps there were issues with .nfsXXX files, or some stability problem not seen when working with a single open file? Not sure.
I have rebuilt the entire distro using mock on NFSv3 with noatime,nolock,proto=udp and had no problems at all, apart from zfs which has to be on a local fs mounted with atime, and that is only to pass it's self-tests.
512KB chunks sound vastly oversized for this sort of a workload. But if you are running ext4 on top of loopback file on top of NFS, no wonder the performance sucks.Well, 512KB chunks is oversize for traditional NFS use, but perhaps undersized for this unusual use case.
The problem with the current method is that you can pretty much guarantee that you will not have proper alignment at any layer which will cause considerable I/O imbalances and hot-spots.
Sounds like a better way to ensure that would be to re-architect the storage solution more sensibly. If you really want to use block level storage, use iSCSI on top of raw partitions. Providing those partitions are suitably aligned (e.g. for 4KB physical sector disks, erase block sizes, underlying RAID, etc.), your FS on top of those iSCSI exports will also end up being properly aligned, and the stride, stripe-width and block group size will all still line up properly.I understand there was an issue with iSCSI stability about a year ago. One of our engineers tried it on his trimslice recently and had no problems so it may be time to reevaluate its use.
Or you could just use bare NFS, unless somebody can provide a concrete example for why that won't work combined with a single local-storage builder for the one or two packages that require a local fs. Having rebuilt 2000+ packages at least 3 times in the past month, I have found it to not be an issue. Whether some of the other 2000+ packages in Fedora have issues - I don't know, but it is not at all clear that there are enough packages that have a problem with NFS to make it unworkable with a single iSCSI builder with a suitable build policy (similar to what is used for "heavy" builders).
But with 40 builders, each builder only hammering one disk, you'll still get 10 builders hammering each spindle and causing a purely random seek pattern. I'd be shocked if you see any measurable improvement from just splitting up the RAID.Let's say 10 (40/4) builders are using one disk at the same time- that's not necessarily a doomsday scenario since their network speed is only 100mbps. The one situation you want to avoid is having numerous mock setups at one time, that will amount to a hammering. How much time on average is spent composing the chroot vs building? Sure, at some point builders will simply overwhelm any given disk, but what is that point? My guess is that 10 is really pushing it. 5 would be better.
LD_PRELOAD=libeatmydata.so makes a _massive_ difference to the mock setup times when not using a cached tar balled mock root image. Less so when using a cached mock root image, but it still makes a difference.
Using the fs image over loopback over NFS sounds so eyewateringly wrong that I'm just going to give up on this thread if that part is immutable. I don't think the problem is significantly fixable if that approach remains.Why is that?
Because you are virtually guaranteed to have a file system that is not aligned, especially WRT block groups.
I don't see why you think that seeking within a single disk is any less problematic than seeking across multiple disks. That will only happen when the file exceeds the chunk size, and that will typically happen only at the end when linking - there aren't many cases where a single code file is bigger than a sensible chunk size (and in a 4-disk RAID0 case, you're pretty much forced to use a 32KB chunk size if you intend for the block group beginnings to be distributed across spindles).It's the chroot composition that makes me think seeking across multiple disks is an issue.
If you are talking about untarring the cached mock rootfs, I doubt it. If you are talking about creating a mock root from scratch, then LD_PRELOAD=libeatmydata.so is what will make the biggest difference due to all the rpm database induced I/O.
And local storage will be what? SD cards? There's only one model line of SD cards I have seen to date that actually produce random-write results that begin to approach a ~5000 rpm disk (up to 100 IOPS), and those are SLC and quite expensive. Having spent the last few months patching, fixing up and rebuilding RHEL6 packages for ARM, I have a pretty good understanding of what works for backing storage and what doesn't - and SD cards are not an approach to take if performance is an issue. Even expensive, highly branded Class 10 SD cards only manage ~ 20 IOPS (80KB/s) on random writes.80KB/s? Really? That sounds like bad alignment.
That is with optimal alignment. It gets worse if you don't make sure the FS is aligned for erase block sizes. 80KB/s = 20 IOPS (random write) with 4KB blocks. A lot of SD cards do even worse than that.
On random reads it is not unusual to see 1000-1500 IOPS, but the performance of random writes on SD cards is pretty dire, and unfortunately, the I/O heavy things like rootfs setup (especially when not using the cached one) are almost pure writes.
I'm still not sure what is the point of using a loopback-ed file for storage instead of raw NFS. NFS mounted with nolock,noatime,proto=udp works exceedingly well for me with NFSv3.I didn't think udp was a good idea any longer.
It will give you a bit less of a network overhead, and if your network is reliable (i.e. no significant packet loss), it certainly won't be any worse than running over TCP.
Well, deadline is about favouring reads over writes. Writes you can buffer as long as you have RAM to spare (expecially with libeatmydata LD_PRELOAD-ed). Reads, however, block everything until they complete. So favouring reads over writes may well get you ahead in terms of keeping he builders busy.It really begs the question: What are builers blocking on right now? I'd assumed chroot composition which is rather write heavy.
That is certainly the most I/O bound part of the task. Gordan _______________________________________________ arm mailing list arm@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/arm