On 2019/4/3 下午5:09, Nikolay Borisov wrote: > > > On 3.04.19 г. 11:54 ч., Qu Wenruo wrote: >> Hi, >> >> Recently Intel LKP performance test is reporting regression of btrfs >> performance. >> >> It points to tree-checker code, and since I'm poking around the >> bcc/ebpf, I spend some time to do an interesting look into the >> performance penalty about both btrfs csum and tree-checker. >> >> The code base is David's misc-next, which contains both write-time tree >> checker and enhanced code to handle fuzzed image. >> >> The tool can be find in my gist: >> https://gist.github.com/adam900710/b5542f2e52ed4687986cf41f64b85253 > > So you are essentially trying to figure out the average run time of 3 > functions, this could have been made simpler by using the funclatency > bcc tool from iovisor repo: > > https://github.com/iovisor/bcc/blob/master/tools/funclatency.py > > > Actually running this tool will show you a latency histogram making it > easier to spot any latency outliers. An average value doesn't mean > anything without having more context i.e stddev. That can be done easily in python part, although the histogram has a better way to present it. > > >> >> To use the tool, one needs bcc-python binding and kernel config for >> eBPF, but at least Arch default kernel has all needed config, so any one >> can try it on Arch. >> >> The work load is: >> mkfs.btrfs -n 4K $DEV >> mount $DEV $MNT >> fsstress -n 10000 -w -d $MNT >> umount $MNT >> >> ## start my script ## >> mount $DEV $MNT >> ls -R $MNT > /dev/null # To read all fs tree blocks >> fsstress -n 1000 -w -d $MNT # Trigger enough write >> umount $MNT >> ## stop my script ## >> >> >> The result is very interesting: >> Basic result is: >> CSUM_TREE_BLOCK: nr=2311 total=10000612 avg=4327 >> TREE_CHECKER_READ: nr=461 total=41911553 avg=90914 >> TREE_CHECKER_WRITE: nr=1575 total=5783330 avg=3671 > > Definitely something worth looking at. And it already exposes a bug. The write time tree checker doesn't check the content of leaf, which is why it's so fast. For the slow read part, it's the empty root owner check, which I'll definitely remove it. Thanks, Qu > >> >> So if just looking at the average number of csum calculate, it only >> brings 3~5μs. And surprisingly, write time tree checker even slower than >> checksum! >> >> Also surprisingly, read time tree checker takes near 100μs. nearly 20 >> times slower than csum/write time tree checker. >> >> So we have a new direction to enhance tree-checker performance. >> BTW, bcc/ebpf is really awesome! >> >> Thanks, >> Qu >>
