Re: Keeping unreachable objects in a separate pack instead of loose?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 11, 2012 at 02:34:14PM -0400, Jeff King wrote:
> You _could_ make a separate cruft pack for each pack that you repack. So
> if I have A.pack and B.pack, I'd pack all of the reachable objects into
> C.pack, and then make D.pack containing the unreachable objects from
> A.pack, and E.pack with the unreachable objects from B.pack. And then
> set the mtime of the cruft packs to that of their parent packs.
> 
> And then the next time you pack, repacking D and E would probably be a
> no-op that preserves mtime, but might create a new pack that ejects some
> now-reachable object.
> 
> To implement that, I think your --list-unreachable would just have to
> print a list of "<pack-mtime> <sha1>" pairs, and then you would pack
> each set with an identical mtime (or even a "close enough" mtime within
> some slop)....

How about this instead?  We distinguish between cruft packs and "real"
packs by the filename.  So we have "cruft-<SHA1>.{idx,pack}" and
"pack-<SHA1>.{idx.pack}".

Normally, git will look at any pack in the pack directory that has an
.idx and .pack extension, but during repack operation, it will by only
look in the pack-* packs first.  If it can't find an object there, it
will then fall back to trying to fetch the object from the cruft-*
packs, and if it finds the object, it copies it into the new pack
which is creating, thus "rescueing" an object which reappears during
the expiry window.  This should be a relatively rare event, and if it
happens, the object will be in two packs, a pack-* pack and a cruft-*
pack, but that's OK.

So since git pack-objects isn't even looking in the cruft-* packs
except when it needs to rescue an object, the objects in the cruft-*
packs won't get copied, and we won't need to have per-object mtimes.
It also means it will go faster since it's not copying the cruft-*
packs at all, and possibly not even looking at them.

Now all we need to do is delete any cruft-* packs which are older than
the expiry window.  We don't even need to look at their contents.

It does imply that we may accumulate a new cruft-<SHA1> pack each time
we run git gc, but users shouldn't be running git gc all that often
anyway.  And even if they do run it all the time, it will still be
more efficient than keeping the unreachable objects as loose objects.

     	       	    	    		    	    - Ted
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]