corinthia-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dennis E. Hamilton" <>
Subject RE: DFStorage
Date Thu, 01 Jan 2015 21:08:05 GMT
I have a question that may just be one of nomenclature, ...

 -- replying below to --
From: Peter Kelly [] 
Sent: Thursday, January 1, 2015 02:58
Subject: DFStorage

I realise that I haven’t done a very good job of documenting the code in Corinthia, as you’ve
probably noticed :) I’ve been meaning to get around to this for a while now.

[ ... ]

Now with the current implementation, which is a very simplistic one, it simply reads the whole
zip file into memory. This is largely due to a limitation in the minizip API, which enforces
sequential access to the entries in a file. It would be conceivable to have the zip DFStorage
implementation first read a directory listing, and then for each file that’s requested,
do a linear scan through all the entries before finding the requested file, and then reading
that. This would be an O(n) operation, but would be unlikely to be a major problem since most
zip packages we’re dealing with will only have a fairly small number of entries.

Minizip does not provide any way to cache the location in the zip file of a particular entry,
even though this information would be possible to obtain in theory (just not through minizip’s
AP). If I were writing a zip implementation from scratch (and maybe this is something we could
consider), I would have it read a list of all entries and remember their locations in a hash
table, so that when a particular named entry is requested, we can go directly to that point
in the file without having to do a linear scan.

[ ... ]

   @Peter, I want to verify that we have the same understanding of the Zip file.

   The Zip file itself has a global directory to all of the component files at the end of
the file.  The global directory provides offsets to where each component file begins in the
Zip stream and also provides other pertinent information.

   To produce a Zip file, minizip would need to remember all of this to append to the stream
once all of the part files are written out.

The global directory could certainly be cached and, if necessary, indexed from a hash table
on the names of the component parts.  

   Without looking at minizip, I would assume that there has to be some internal representation
of the global directory even if it is not exposed.  Would it be useful to exploit that somehow
in elevating a better API?

   So long as the Zip stream can be read via random access, it is normal to access the global
directory first and then access the parts based on the global directory, even if access is
in sequential order of those parts in the stream.  That helps detect apparent corruption of
the Zip and it is essential when the header for a component file does not specify the length
of the file data.

   Does this square with your understanding of what is involved in minizip operation?



View raw message