commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Bodewig <bode...@apache.org>
Subject Re: [compress] Archiver Detection fails
Date Thu, 26 Feb 2009 11:02:14 GMT
On 2009-02-26, Christian Grobmeier <grobmeier@gmail.com> wrote:

>> Note that the method (or better the input stream) is still broken in a
>> more general sense since it will not detect self extracting ZIP files
>> which do have a tiny native bootstrapper tacked to the front of the
>> archive.  The ZipFile class can read them, ZipArchiveInputStream
>> can't.

> Is there a chance that we can fix this in our implementation?

Well, I'll open a JIRA issue for ZipArchiveInputStream anyway, see
below for biggest problem I see with ZipArchiveInputStream (which is
why Ant never had one).

The specific question of self-extracting archives could be solved by
scanning more of the archive for a local file header and skipping
everything that comes upfront.  The native bootstrap code isn't big,
usually somewhere below 48k, so we could limit the search to a
specific amount of data and avoid scanning several gigabytes.

I wouldn't want to do that inside the matches-Method, though.  Rather
I'd say we don't autodetect self-extracting archives but make
ZipArchiveInputStream deal with it when used explicitly.

Generally speaking the InputStream metaphor doesn't work for ZIP
archives.

A ZIP archive contains what is called "central directory" at the end
of the archive.  This is the only authoritative source telling you
what is inside that archive.

Before the central directory there are the actualy contents (among
other things).  For each entry you get a local file header describing
the entry (duplicating information from the central directory) and the
actual contents.  The central directory contains a pointer to the
local file data.

java.util.ZipInputStream reads the stream in sequence and creates
ZipEntries as it finds local file information.

ZipFile (our, not the one in java.util.zip - I don't know what the
later does) reads the archve from the back and parses the central
directory to see what is inside the archive.

It is not uncommon for archiver to "update" existing archives by
adding new local file data at the end and rewrite the central
directory without removing the old local file data.  In such a case
java.util.ZipInputStream will find entries that shouldn't be there or
worse old data for updated entries.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message