Re: btrfs kernel workqueues performance regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/22/2014 07:39 PM, Dave Chinner wrote:
> On Tue, Jul 15, 2014 at 01:39:11PM -0400, Chris Mason wrote:
>> On 07/15/2014 11:26 AM, Morten Stevens wrote:
>>> Hi,
>>>
>>> I see that btrfs is using kernel workqueues since linux 3.15. After
>>> some tests I noticed performance regressions with fs_mark.
>>>
>>> mount options: rw,relatime,compress=lzo,space_cache
>>>
>>> fs_mark on Kernel 3.14.9:
>>>
>>> # fs_mark  -d  /mnt/btrfs/fsmark  -D  512  -t  16  -n  4096  -s  51200  -L5  -S0
>>> FSUse%        Count         Size    Files/sec     App Overhead
>>>      1        65536        51200      17731.4           723894
>>>      1       131072        51200      16832.6           685444
>>>      1       196608        51200      19604.5           652294
>>>      1       262144        51200      18663.6           630067
>>>      1       327680        51200      20112.2           692769
>>>
>>> The results are really nice! compress=lzo performs very good.
>>>
>>> fs_mark after upgrading to Kernel 3.15.4:
>>>
>>> # fs_mark  -d  /mnt/btrfs/fsmark  -D  512  -t  16  -n  4096  -s  51200  -L5  -S0
>>> FSUse%        Count         Size    Files/sec     App Overhead
>>>      0        65536        51200      10718.1           749540
>>>      0       131072        51200       8601.2           853050
>>>      0       196608        51200      11623.2           558546
>>>      0       262144        51200      11534.2           536342
>>>      0       327680        51200      11167.4           578562
>>>
>>> That's really a big performance regression :(
>>>
>>> What do you think? It's easy to reproduce with fs_mark.
>>
>> I wasn't able to trigger regressions here when we first merged it, but I
>> was sure that something would pop up.  fs_mark is sensitive to a few
>> different factors outside just the worker threads, so it could easily be
>> another change as well.
>>
>> With 16 threads, the btree locking also has a huge impact, and we've
>> made change there too.
> 
> FWIW, I ran my usual 16-way fsmark test last week on my sparse 500TB
> perf test rig on btrfs. It sucked, big time, much worse than it's
> sucked in the past. It didn't scale past a single thread - 1 thread
> got 24,000 files/s, 2 threads got 25,000 files/s 16 threads got
> 22,000 files/s.

We had a trylock in the btree search code that always took the spinlock
but did a trylock on the blocking lock.  This was changed to a trylock
on the spinlock too because some of the callers were using trylock
differently than in the past.

It's a regression for this kind of run, but makes the btrfs locking much
less mystical.  I'm fixing up the performance regression part for the
next merge window, but I didn't want to mess around too much with it in
3.16 with all the other locking churn.

For this kind of fsmark run the best results still come from one subvol
per thread.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux