commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Bodewig <>
Subject [compress] Need API Feedback/Advice for ZipArchiveOutputStream ans ZIP64
Date Sat, 06 Aug 2011 04:40:24 GMT

there are eight possible permutations of compressed/uncompressed entries
that get written to a seekable-/non-seekable stream whose size is either
known or unknown prior to writing them.

One of them is prohibited (uncompressed/non-seekable/unknown size) and
has been prohibited before, no change here.

For six of the remaining seven permutations ZipArchiveOutputStream
should be generating archives that transparently enable ZIP64 features
for entries if and only if they are too big to be stored without ZIP64.
I.e. the resulting archive will either be readable by an implementation
that doesn't support ZIP64 or it contains files that would be too big
for such an implementation anyway.  The price we pay for some cases are
an additional 20 bytes per entry that are never used by anybody.

The only case that isn't covered so far is compressed / non-seekable
output / input of unknown size.

Such entries are stored using a feature that is called the "data
descriptor".  There are two different formats of the data descriptor for
ZIP64 and not-ZIP64 archives and the archive writer has to signal which
type of descriptor it is going to write before it starts writing the
entry's data.

This means ZipArchiveOutputStream must decide whether it is going to use
the ZIP64 format before it knows whether it would actually need it or
not.  If it signals it is going to use ZIP64 then an implementation that
doesn't support ZIP64 (like Compress 1.2 or may fail to
read the archive, which is bad if the entry turns out to be smaller than
4GiB.  If it doesn't signal ZIP64 it can't write big entries at all.

This decision can be made at the granularity of a single entry.  I.e. it
is possible to not use ZIP64 for the majority of entries and enable it
for individual entries.

IMHO there is no right or wrong decision here that the library could
make.  The user-code will have to decide whether ZIP64 should be enabled
or not.  The main questions to me are whether we want to attach this
decision to the stream or the entry itself and what the default should

InfoZIP's ZIP has decided to make it an option for the whole archive
(the command line doesn't offer much flexibility here) and make it
default to ZIP64.

My current thinking is that is a likely candidate for the
receiving end of ZIPs we create, so it may be better to turn ZIP64 off
by default, but I'm not sure.

I'm leaning towards adding a setUseZip64(boolean) method at the level of
ZipArchiveOutputStream and make it default to false.  This method could
be called in between putArchiveEntry calls to make it apply selectively
to indiviual entries.

The name is totally open for debate since as it stands it sounds as if
you could turn off all Zip64 features which I wouldn't want to do for
the cases that can be dealt with transparently.  Then again it could use
a Boolean argument with "null" meaning "do the best you can" and false
"don't even use Zip64 if you think it is safe".

Any ideas?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message