corinthia-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Kelly <>
Subject Re: Zip madness !
Date Sat, 01 Aug 2015 17:33:13 GMT
Hi Jan,

I’ve just fixed one bug I found (was causing a crash; but valgrind helped narrow it down)
- a DFextZipDirEntry pointer was being set via incorrect pointer entry (see my commit to the
newZipExperiment branch for details).

After fixing this I got a correct directory listing of a test document I created in Word -
I only tested it with one file however, so it may not address the problem you ran into with
the particular test file you mentioned.

Dr Peter M. Kelly

PGP key: <>
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

> On 1 Aug 2015, at 10:41 pm, Peter Kelly <> wrote:
> Hi  Jan,
> I’ll get to your question in a moment, but I just checked out the newZipExperiment
branch and noticed that almost all of the source files have changed (I was expecting a relatively
small diff, with only a few files changed). It looks like most of these differences are due
to reordering the #includes at the top of each source file. If we’re going to do this, could
we make it a separate commit in master, so it’s easier to see exactly what has changed in
the zip branch?
> Actually I normally intentionally put system headers after other headers in the project,
as it helps to detect cases where a custom header depends on types declared in a system header,
and thus for which importing that header (by itself) in a source file would result in compilation
errors due to the missing references. For example DFBuffer.h has an #include <stdarg.h>
at the type since some of the functions take the va_list data type. If one of us uses such
this type in another header which doesn’t have #include <stdarg.h>, then any C file
that imports it (directly or indirectly) has to remember to explicitly include stdarg.h (and
that could be a *lot* of files, if the header is referenced from lots of places). So by placing
the any system includes needed by the source file after all custom headers, we can pick up
on these errors more easily.
> Regarding the zip file format, I need to look up on some stuff and will get back to you
shortly. But I suspect some of the duplication may be related to the fact that a zip file
is meant to be read backwards. Rather than starting at the beginning of the file, reading
begins at the end, working backwards through the file to find potentially multiple copies
of the directory listing. This serves two purposes:
> 1) You can “modify” the contents of a zip file simply by appending (with the compressed
content of new/changed files added, and a new directory listing including these files, an
*not* including any files which have been “deleted”, i.e. masked out).
> 2) A zip file can be appended to the end of another file format; the most common example
being self-extracting .exe files. Since .exe files are read from the beginning, the program
loader on windows doesn’t care about the fact that there’s the trailing data at the end.
And it’s still a valid zip file, since the .exe content at the start is ignored when reading
the directory listing.
> I think you may be aware of some of these details already, and there’s some nuances
I’ve probably missed. I’m about to have a look through the code you currently have in
the branch.
> —
> Dr Peter M. Kelly
> PGP key: <>
> (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)
>> On 1 Aug 2015, at 4:33 pm, jan i <> wrote:
>> Hi
>> Does anybody know why zip has a mad inefficient directory structure ?
>> I try to understand the why, but fail.
>> A zip file, contains 1 global directory with information about every single
>> file (flat structure, no
>> sub directories, but filenames may contain a "/"). That is logical and
>> expected.
>> BUT in front of every file, there are a local file header, with filename
>> about 3/4 of the information
>> from the global directory. This information seems pure redundant and
>> unneeded.
>> What am I missing here ? on one of my test docx, the local headers are
>> about 10% of the filesize (looong filenames) which could be thrown away.
>> Hope somebody can see what I failed to see.
>> rgds
>> jan i.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message