Re: remove_duplicates() in builtin/fetch-pack.c is O(N^2)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 22, 2012 at 05:59:29AM +0200, Michael Haggerty wrote:

> >Try doing "git fetch . refs/*:refs/*" in a repository with a large
> >number of refs (e.g., 400K). With git v1.7.10, this takes about 9.5s on
> >my machine. But using the version of git in "next" takes about 16.5s.
> >Bisection points to your 432ad41 (refs: store references hierarchically,
> >2012-04-10). Perf shows sort_ref_dir and msort_with_tmp as hot spots.
> 
> I'm looking into this.
> 
> For your test, were the 400k references all in one "directory", or
> were they sharded somehow?

Sharded. This was the rails network repo, so it basically looks like
this:

  refs/remotes/XXXXXXX/heads/master
  refs/remotes/XXXXXXX/tags/v1.0
  refs/remotes/XXXXXXX/tags/v1.1
  ...
  refs/remotes/XXXXXXX/tags/v3.5
  refs/remotes/YYYYYYY/heads/master
  refs/remotes/YYYYYYY/tags/v1.0
  refs/remotes/YYYYYYY/tags/v1.1
  ...

Basically the same 90 or so refs you would have in a normal repository
(a branch or two, and a bunch of tags), repeated over and over under
numbered remote hierarchies (i.e., each remote is basically a copy of
the refs from somebody's fork of the repo).

I can provide you with the exact repo if you want, but it is about 100M.

Interestingly, I don't seem to get nearly as much slow-down in my "fake"
many-refs repo, which has all 100K entries directly in refs/heads.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]