Search squid archive

Re: Re: What are recommended settings for optimal sharing of cache between SMP workers?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 18, 2014 at 9:28 PM, Alex Rousskov
<rousskov@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
> On 02/18/2014 04:11 PM, Rajiv Desai wrote:
>
>> It seems like there is a bug where an object overwrites previously
>> written object:
>
> This is not really a bug -- hash collisions do happen, as you have
> discovered and Rock store resolves them by overwriting older entries if
> possible (or not caching the newer ones otherwise). The smaller your
> cache, the more collisions you will have for the same number of unique
> cache keys.
>
> How many total slots does your rock cache_dir have?

13107198 (200 GB with 16KB slots).

I think I was thrown off by the hit % in mgr:info stats. Are these
stats accurate when using 8 SMP workers?
Is the % underreported? I couldnt find an absolute value of
hits/misses from stats. :/
Any pointers?

>
>
>> Needs a better hash as pointed out by the TODO :)
>> I will send out a patch.
>
> Thank you. No hash function will eliminate collisions though. If
> somebody wants to eliminate side-effects of hash collisions on recently
> added objects, they would have to implement a different anchor
> allocation algorithm that resolves hash collisions through chains (for
> example). I doubt that would be easy though!
>
>

Yes, I realized that as I read the code further. I will try to give it
a shot but I agree that this will be non trivial.
Currently sharding data across multiple cache_dirs will reduce
collisions, correct?

>> Is the key guaranteed to be a null terminated string?
>
> No. The key is a 128-bit opaque buffer, essentially.
>
>
>> I intend to use std::tr1::hash
>
> Please keep in mind that the key is an MD5 sum already. See
> store_key_md5.cc. I am not a hashing expert by any means, but the MD5
> sum should not really need much further hashing.
>
> It is possible that the k[0]+k[1] sum in the code below should be
> removed in favor of either k[1] or k[0] alone, but again, this may not
> have any significant effect (except for any specific case where the two
> given URLs used to collide).
>

I tried k[0] ^ k[1] which seemed to provide fewer collisions (67
collisions in 39129 GETs).
Haven't quantified and compared i detail though.


>
>> <code>
>> sfileno
>> Ipc::StoreMap::anchorIndexByKey(const cache_key *const key) const
>> {
>>     const uint64_t *const k = reinterpret_cast<const uint64_t *>(key);
>>     // TODO: use a better hash function
>>     return (k[0] + k[1]) % shared->limit;
>> }
>> </code>
>
>
> HTH,
>
> Alex.
>




[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux