Re: [PATCH 3/3] revision: insert unsorted, then sort in prepare_revision_walk()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 2, 2012 at 09:24, Martin Fick <mfick@xxxxxxxxxxxxxx> wrote:
> On Saturday, March 31, 2012 04:11:01 pm René Scharfe wrote:
>> Speed up prepare_revision_walk() by adding commits
>> without sorting to the commit_list and at the end sort
>> the list in one go.  Thanks to mergesort() working
>> behind the scenes, this is a lot faster for large
>> numbers of commits than the current insert sort.
>
> This speeds up my git push test on my repo with ~100K refs
> case from out ~43s to about ~10s.  Not bad, thanks!
>
> The rest of the 10s do not seem to be spent with high CPU on
> either the pushing or the receiving side (only a very small
> 100% burst on both sides near the end of the operation).  I
> also ran iotop on the receiving side and could not find any
> activity (of course, the repo is likely cached).  iftop does
> show a decent amount of traffic during this time, so perhaps
> we are finally approaching the protocol limit?

The protocol is basically two round trips, receive side tells push
side what it has, push side sends data, receive side sends
success/error response. It would be more traffic with SSH due to the
encryption and custom ACK messages that SSH runs to wrap the stream.

> But, I have my doubts on that to be honest.  The reason is
> because I am able to hack Gerrit to receive this push much
> faster (around 3.5s) by reusing a cached RevWalk.  Without
> the cached RevWalk, Gerrit (using jgit) is about the same as
> your new patch ~10s.  I am not saying that git is spending
> its time in the same place (but it may be) as jgit, but with
> jgit, the time I was able to save with the cached RevWalk
> was the time spent loading and parsing the RevCommits.  This
> could be the same thing that git is doing?  And while it may
> not be I/O (disk) bound so to speak since the packs are
> likely cached, it may still be memory bound on that I/O?  If
> it is memory bound, and not I/O(disk) or CPU bound, I guess
> it makes sense that git and jgit would perform about the
> same (10s)?

Git can't really do the same thing as "cache the RevWalk". Its
spawning a new process that needs to decompress and parse each commit
object to determine its timestamp so the commits can be sorted into
the priority queue. This is still an O(N) operation given N
references.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]