Re: remove_duplicates() in builtin/fetch-pack.c is O(N^2)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/22/2012 07:35 PM, Junio C Hamano wrote:
The current code reads the whole thing in upon first use of _any_ element
in the file, just like the index codepath does for the index file.

But the calling pattern to the refs machinery is fairly well isolated and
all happens in refs.c file.  Especially thanks to the recent work by
Michael Haggerty, for "I am about to create a new branch 'frotz'; do I
have 'refs/heads/frotz' or anything that begins with 'refs/heads/frotz/'?"
kind of callers, it is reasonably easy to design a better structured
packed-refs file format to allow us to read only a subtree portion of
refs/ hierarchy, and plug that logic into the lazy ref population code.
Such a "design a better packed-refs format for scalability to 400k refs"
is a very well isolated project that has high chance of succeeding without
breaking things.  No code outside refs.c assumes that there is a flat
array of refs that records what was read from the packed-refs file and can
walk linearly over it, unlike the in-core index.

Even with the current file format, it would not be so difficult to bisect the file, synchronizing on record boundaries by looking for the next NL character. Because of the way the file is sorted, it would also be reasonably efficient to read whole subtrees in one slurp (e.g., for for_each_ref() with a prefix argument). Nontrivial modifications would of course not be possible without a rewrite.

There would need to be some intelligence built-in; after enough single-reference accesses come in a row, then the refs module should probably take it upon itself to read the whole packed-refs file to speed up further lookups.

If you do "for_each_ref()" for everything (e.g. sending 'have' during the
object transfer, or repacking the whole repository), you would end up
needing to read the whole thing for obvious reasons.

Yes. ISTM that any hope to avoid O(number of refs) problems when exchanging commits must involve using more intelligence about how references are related to each other topologically to improve the negotiation about what needs to be transferred.

Michael

--
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]