Re: [PATCH 3/3] revision: insert unsorted, then sort in prepare_revision_walk()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 3, 2012 at 10:49 AM, Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> wrote:
>> Has anyone looked seriously at a new index format that stores the
>> redundant information in a more easily accessible way? It would increase
>> our disk usage, but for something like linux-2.6, only by 10MB per
>> 32-bit word. On most of my systems I would gladly spare some extra RAM
>> for the disk cache if it meant I could avoid inflating a bunch of
>> objects. And this could easily be made optional for systems that don't
>> want to make the tradeoff (if it's not there, you fall back to the
>> current procedure; we could even store the data in a separate file to
>> retain indexv2 compatibility).
>>
>> So it's sort-of a cache, in that it's redundant with the actual data.
>> But staleness and writing issues are a lot simpler, since it only gets
>> updated when we index the pack (and the pack index in general is a
>> similar concept; we are "caching" the location of the object in the
>> packfile, rather than doing a linear search to look it up each time).
>
> I think I have something like that, (generate a machine-friendly
> commit cache per pack, staying in $GIT_DIR/objects/pack/ too). It's
> separate cache staying in $GIT_DIR/objects/pack, just like pack-.idx
> files. It does improve rev-list time, but I'd rather wait for packv4,
> or at least be sure that packv4 will not come anytime soon, before
> pushing the cache route.

When I looked at commit cache for rev-list, I tried to cache trees too
but the result cache was too big. I managed to shrink the tree cache
down and measured the performance gain. Sorry no code here because
it's ugly, just numbers, but you can look at the cache generation code
at [1]

On linux-2.6.git, one 521MB pack, it generates a 356MB cache and a
30MB index companion. Though if you are willing to pay extra 5 seconds
for decompressing, then the cache can go down to 94MB. We can cut
nearly half "rev-list --objects --all" time with this cache
(uncompressed cache):

$ time ~/w/git/git rev-list --objects --all --quiet </dev/null
real    2m31.310s
user    2m28.735s
sys     0m1.604s

$ time TREE_CACHE=cache ~/w/git/git rev-list --objects --all --quiet </dev/null
real    1m6.810s
user    1m6.091s
sys     0m0.708s

 $ time ~/w/git/git rev-list --all --quiet </dev/null
real    0m14.261s  # should be cut down to one third with commit cache
user    0m14.088s
sys     0m0.171s

Not really good. "rev-list --objects"'s taking less than 30s would be
nicer. lookup_object() is on top from 'perf' report with cache on. Not
sure what to do with it.

[1] https://gist.github.com/2310819
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]