Re: Quick bcache benchmark

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Sorry, I was thinking about that issue for awhile and then I got distracted...

It's not user error, it's an irritating corner case. Basically, it's
the result of a workaround for a particularly obscure data corruption

If a write bypasses the cache, it has to invalidate that region of the
cache; the null key it leaves in the cache will block cache misses
from adding that data to the cache until the btree node fills up (and
possibly splits).

It hasn't been an issue for us in normal operation, but when you're
just testing - i.e. you don't have much load - that node split may not
happen for a long time, and so if for some reason a bunch of data
bypassed the cache... well, you see what happens.

Unfortunately a better solution to the original race is not going to
be simple, so it's probably not going to be done in the very near
future. It's a _very_ difficult race to hit, but in the meantime I'd
rather lose performance than corrupt data.

But the good news is if you put normal server-ish load on it the issue
should go away in steady state operation.

On Thu, Dec 15, 2011 at 3:40 PM, Marcus Sorensen <shadowsor@xxxxxxxxx> wrote:
> Any ideas on this? Do you think it's a bug, or am I just holding it wrong? :-)
> On Sat, Dec 10, 2011 at 8:02 AM, Marcus Sorensen <shadowsor@xxxxxxxxx> wrote:
>> That keeps the 'bypassed' value from increasing, but it doesn't change
>> write performance.
>> [root@sansrv2-10 stats_day]# cat *
>> 27.6M
>> 83
>> 3500
>> 0
>> 166
>> 24380
>> 40660
>> 0
>> ...benchmarking...
>> [root@sansrv2-10 stats_day]#  for i in `ls`; do echo -n "$i "; cat $i;
>>> done 2>/dev/null
>> bypassed 27.6M
>> cache_bypass_hits 83
>> cache_bypass_misses 3500
>> cache_hit_ratio 0
>> cache_hits 410
>> cache_miss_collisions 48879
>> cache_misses 80545
>> cache_readaheads 0
>> /sys/fs/bcache/60da061c-d646-4ebe-931a-d8580add411d
>> average_key_size 0
>> block_size 2.0k
>> btree_cache_size 3.2M
>> bucket_size 1.0M
>> cache_available_percent 100
>> clear_stats congested 0
>> congested_threshold_us 0
>> dirty_data 0
>> io_error_halflife 0
>> io_error_limit 8
>> root_usage_percent 0
>> synchronous 1
>> tree_depth 1
>> On Fri, Dec 9, 2011 at 11:33 PM, Kent Overstreet
>> <kent.overstreet@xxxxxxxxx> wrote:
>>> On Fri, Dec 09, 2011 at 10:09:55AM -0700, Marcus Sorensen wrote:
>>>> Here's some more info. I'm running kernel 3.1.4. When I do random
>>>> writes, the 'bypassed' number increases in stats. Now I'm random
>>>> writing direct to /dev/bcache0 and get the same result.
>>> Weird. From what you're describing it sounds like throttling is screwed
>>> up (and it was recently), but I can't reproduce it now.
>>> Can you try echoing 0 to congested_threshold_us in the cache set dir,
>>> and seeing if that fixes it?
>>>> There also seems to be some work needed with clean-up, since I'm
>>>> unfamiliar with how bcache works I attempted to make-bcache twice,
>>>> thinking I'd start over. That worked, but because my cache device was
>>>> already registered I was unable to re-register my newly formatted
>>>> cache dev, got "kobject_add_internal failed for bcache with -EEXIST,
>>>> don't try to register things with the same name in the same
>>>> directory." I was still able to use my cache device via the old uuid,
>>>> but this will probably cause problems on reboot. Perhaps an unregister
>>>> file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to
>>>> see if I could clear /sys/fs/bcache, but no luck. make-bcache should
>>>> perhaps check for an existing superblock, ask for confirmation, and
>>>> give some sort instruction on how to unregister, or do it for you if
>>>> you reformat.
>>> Yeah, I think for some reason bcache isn't opening the devices
>>> exclusively on 3.1. I'll have a look...
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at

[Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]