On 05/10, Michael Haggerty wrote:
> On 05/10/2012 02:19 PM, Thomas Gummerer wrote:
> >== Flat loading
> >
> >Since internally git expects and works with lexicografic ordering,
> >a simple linear scan throught the subdirectories doesn't give
> >the right internal sorting. To achieve the right internal sorting
> >the loading will be done in the following way:
> >
> >The data structure is a stack of queues, to allow continous reading
> >of the file.
> >
> >s -> queue1
> >t -> queue2
> >a -> queue3
> >c -> queue4
> >k -> queue5
> >
> >dirs = read_all_directories
> >
> >foreach dir in dirs do
> > file = read_next_file
> >
> > while element_on_top_of_stack.first_element< nextdir
> > indexentries.append(dequeue(element_on_top_of_stack))
> > if element_on_top_of_stack == emtpy:
> > remove_element_on_top_of_stack
> >
> > if file[filename]< nextdir
> > indexentries.append(file)
> > else
> > queue.add(file)
> > foreach f in rest_of_files_in_directory:
> > queue.add(f)
> > stack.push(queue)
> >
> >foreach queue in stack:
> > foreach entry in queue:
> > indexentry.append(entry)
>
> 1. It seems to me that the first time that a file with a filename
> before nextdir is encountered, the reading of the directory
> containing the file will be terminated. This would, for example, be
> the case for any directory that contains multiple files but no
> subdirectories.
>
> 2. There is still a lot that is unnecessarily obscure. For example,
> I suppose (but you don't say) that "rest_of_files_in_directory"
> means to read the files at that moment. It would be more explicit
> (and no more verbose) to write
>
> while (f = read_next_file()) != NULL:
> queue.add(f)
>
> 3. You don't cover corner cases, like when read_next_file() is
> called but there are no files left in the directory, or when there
> is no nextdir (which itself is not defined). OK, this pseudocode is
> only meant to be illustrative, so I guess we can wait until your
> real implementation to see such details. On the other hand, you
> probably want to get all the details straight in pseudocode or
> Python before you start translating it into C.
>
> 4. I think the algorithm would be easier to understand and implement
> if it were written recursively. The call stack would replace your
> explicit stack (but you would still need one queue per directory
> level). A key observation is that when "nextdir" precedes the next
> file, then all of the files in subdirectories of nextdir do as well.
> Thus an obvious recursion would be to call a function like
> read_all_files_under_dir(indexentries, dirs, dirindex) at this
> point, which would consume all of the directories that are
> subdirectories of dirs[dirindex] (i.e., all directories whose names
> have the string dirs[dirindex] as a prefix). Using this method
> would mean that there is no need to compare each file under
> dirs[dirindex] against the next file in the outer directory.
>
> Michael
Thanks for your feedback! To get clearer code I've now written a
working reader for the v5 index format in Python. The full reader
would probably be to long for the mailing list, but here is the
interesting part:
def readfiles(directories, dirnr, entries):
global filedata
f.seek(directories[dirnr]["foffset"])
offset = struct.unpack("!I", fread(4))[0]
f.seek(offset)
filedata = list()
queue = list()
i = 0
while i < directories[dirnr]["nfiles"]:
filedata.append(struct.pack("!I", f.tell()))
filename = ""
byte = fread(1)
while byte != '\0':
filename += byte
byte = fread(1)
data = struct.unpack("!HHIII", fread(16))
objhash = fread(20)
readcrc = struct.pack("!i", binascii.crc32("".join(filedata)))
crc = f.read(4)
if readcrc != crc:
print "Wrong CRC: " + filename
filedata = list()
i += 1
queue.append(dict({"name": directories[dirnr]["pathname"] + filename, "flags": data[0], "mode": data[1], "mtimes": data[2], "mtimens": data[3], "statcrc": data[4], "objhash": binascii.hexlify(objhash)}))
if len(directories) > dirnr:
i = 0
while i < len(queue):
if len(directories) - 1 > dirnr and queue[i]["name"] > directories[dirnr + 1]["pathname"]:
entries, dirnr = readfiles(directories, dirnr + 1, entries)
else:
entries.append(queue[i])
i += 1
return entries, dirnr
The full reader can be found here:
https://github.com/tgummerer/git/blob/pythonprototype/git-read-index-v5.py
--
Thomas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
[Newbies FAQ] [Linux Kernel Development] [Free Online Dating] [Gcc Help] [IETF Annouce] [DCCP] [Netdev] [Networking] [Security] [V4L] [Bugtraq] [Free Online Dating] [Photo] [Yosemite] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Linux SCSI] [Fedora Users] [Linux Resources]