|
|
|
[PATCH v2 00/10] Large blob fixes | |
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] |
|
These patches make sure we avoid keeping whole blob in memory, at
least in common cases. Blob-only streaming code paths are opened to
accomplish that.
There are a few things I'd like to see addressed, perhaps as part of
GSoC if any student steps up.
- somehow avoid unpack-objects and keep the pack if it contains large
blobs. I guess we could just save the pack, then decide to
unpack-objects later. I've updated GSoC ideas page about this.
- pack-objects still puts large blobs in memory if they are in loose
format. This should not happen if we fix the above. But if anyone
has spare energy, (s)he can try to stream large loose blobs in the
pack too. Not sure how ugly the end result could be.
- archive-zip with large blobs. I think two phases are required
because we need to calculate crc32 in advance. I have a feeling
that we could just stream compressed blobs (either in loose or
packed format) to the zip file, i.e. no decompressing then
compresssing, which makes two phases nearly as good as one.
- not really large blob related, but it'd be great to see
pack-check.c and index-pack.c share as much pack reading code as
possible, even bettere if sha1_file.c could join the party.
- I've been thinking whether we could just drop pack-check.c, which
is only used by fsck, and make fsck run index-pack instead. The
pros is we can run index-pack in parallel. The cons is, how to
return marked object list to fsck efficiently.
Anyway changes from v1:
- use stream_blob_to_fd() patch from Junio (better factoring)
- split show_object() in "git show" in two separate functions, one
for tag and one for blob, as they do not share much in the end
- get rid of "index-pack --verify" patch. It'll come back separately
Junio C Hamano (1):
streaming: make streaming-write-entry to be more reusable
Nguyễn Thái Ngọc Duy (9):
Add more large blob test cases
cat-file: use streaming interface to print blobs
parse_object: special code path for blobs to avoid putting whole
object in memory
show: use streaming interface for showing blobs
index-pack: split second pass obj handling into own function
index-pack: reduce memory usage when the pack has large blobs
pack-check: do not unpack blobs
archive: support streaming large files to a tar archive
fsck: use streaming interface for writing lost-found blobs
archive-tar.c | 35 +++++++++++++++----
archive-zip.c | 9 +++--
archive.c | 51 ++++++++++++++++++---------
archive.h | 11 +++++-
builtin/cat-file.c | 23 ++++++++++++
builtin/fsck.c | 8 +---
builtin/index-pack.c | 95 ++++++++++++++++++++++++++++++++++++--------------
builtin/log.c | 34 ++++++++++-------
cache.h | 2 +-
entry.c | 53 +++-------------------------
fast-import.c | 2 +-
object.c | 11 ++++++
pack-check.c | 21 ++++++++++-
sha1_file.c | 78 +++++++++++++++++++++++++++++++++++------
streaming.c | 55 +++++++++++++++++++++++++++++
streaming.h | 2 +
t/t1050-large.sh | 59 ++++++++++++++++++++++++++++++-
wrapper.c | 27 ++++++++++++--
18 files changed, 434 insertions(+), 142 deletions(-)
--
1.7.8.36.g69ee2
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
[Newbies FAQ] [Linux Kernel Development] [Free Online Dating] [Gcc Help] [IETF Annouce] [DCCP] [Netdev] [Networking] [Security] [V4L] [Bugtraq] [Free Online Dating] [Photo] [Yosemite] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Linux SCSI] [Fedora Users] [Linux Resources]