Re: remove_duplicates() in builtin/fetch-pack.c is O(N^2)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

On Mon, May 21, 2012 at 11:51:16PM -0600, Martin Fick wrote:

> >Anyway, my point is that we don't even have to talk about "reasonable"
> >or "absurd". Git should be fast even on absurd cases, because 99% of
> >the work has already been done, and the last 1% is easy.
> I hope you are right, but I don't quite completely share your
> optimism.  Some of that last 1% is perhaps last exactly because it is
> hard.  More specificaly, I am talking about the git protocol's ref
> advertisement on connection.  This has been considered a known issue
> for many years, yet it has not been fixed because it is hard to fix
> since it requires breaking the protocol in a non backwards compatible
> way.  I would be delighted if you had an easy fix for this rather
> fundamental ref scaling issue? We talked with Junio and Shawn and they
> agreed that it would be reasonable to put forward a proposal which
> does break backwards compatibility. So if there is a chance that there
> still may be another way, I hope it is found before this gets underway
> (no, I don't really expect that to happen),

I may be overly optimistic, but I think I may also have not communicated
the limits to my optimism. Right now, it is the case that some
operations on many refs are just impossibly slow due to poor algorithm
selection in the implementation. I am optimistic that we can drop those
in favor of linear algorithms, or at least O(n lg n) algorithms in most
cases. And that would bring these cases down from "don't do that, it's
broken" to "it works, but obviously it's slower than not having a
zillion refs".

I am not nearly as optimistic about the work scaling better than linear.
That is going to take some clever algorithms, as well as possibly some
design tradeoffs. AFAIK, ref advertisement scales linearly. Which is
probably not acceptable over a slow link, and we could do better.

But in my mind, we are still doing the "make it at least linear" portion
of the process. People aren't really using repos with tons of refs
because they currently perform so poorly, so we don't yet have good data
on how to solve the "better than linear" problems. As the quadratic
problems clear up, hopefully the locations of (and solutions to) those
other problems will be more clear.

To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at

[Newbies FAQ]     [Linux Kernel Development]     [Free Online Dating]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Free Online Dating]     [Photo]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]     [Linux Resources]

Add to Google