Re: Keeping unreachable objects in a separate pack instead of loose?
|[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Jeff King <peff <at> peff.net> writes: > > Then, the creation of unreferenced objects from successive 'git add' > > shouldn't create that many objects in the first place. They currently > > never get the chance to be packed to start with. > > I don't think these objects are necessarily from successive "git add"s. > That is one source, but they may also come from reflogs expiring. I > guess in that case that they would typically be in an older pack, > though. ... > That is satisfyingly simple, but the storage requirement is quite bad. > The unreachable objects are very much in the minority, and an > occasional duplication there is not a big deal; duplicating all of the > reachable objects would double the object directory's size. ... (I don't think this is a valid generalization for servers) I am sorry to be coming a bit late into this discussion, but I think there is an even worse use case which can cause much worse loose object explosions which does not seem to have been mentioned yet: "the server upload rejected case". For example, think of a client pushing a change from the wrong repository to a server. Since there will be no history in common, the client will push the entire repository and if for some reason this gets rejected by the server (perhaps a pre-receive hook, or a gerrit server which says: "way too many new changes..."), then the pack file may stay abandonned on the server. When gc runs: boom the entire history of that other project will explode but not get pruned since the pack file may be fairly new! I believe that this has happened to us several times fairly recently. We have a tiny project which some people keep confusing for the kernel and they push a change destined for the kernel to it. Gerrit rejects it and their massive packfile (larger than the entire project) stays around. If gc runs, it almost becomes a DOS for us, the sheer number of loose object files makes the system crawl when accessing that repo, even on an SSD. We have been talking about moving to NFS soon (with packfiles git should still perform fairly well on NFS), but this explosion really scares me. It seems like the current design is a DOS just waiting to happen for servers. While I would love to eliminate the races discussed in this thread, I think I agree with Ted in that the first fix should just focus on never expanding loose objects for pruning (if certain objects simply don't do well in pack files and the local gc policy says they should be loose, go ahead: expand them, but that should be unrelated to pruning). People can DOS a server with unused packfiles too, but that rarely will have the same impact that loose objects would have, -Martin -- Employee of Qualcomm Innovation Center, Inc. which is a member of Code Aurora Forum -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
[Newbies FAQ] [Linux Kernel Development] [Free Online Dating] [Gcc Help] [IETF Annouce] [DCCP] [Netdev] [Networking] [Security] [V4L] [Bugtraq] [Free Online Dating] [Photo] [Yosemite] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Linux SCSI] [Fedora Users] [Linux Resources]