commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <>
Subject Re: [compress] Need API Feedback/Advice for ZipArchiveOutputStream ans ZIP64
Date Sat, 06 Aug 2011 17:42:03 GMT
On 8/5/11 9:40 PM, Stefan Bodewig wrote:
> Hi,
> there are eight possible permutations of compressed/uncompressed entries
> that get written to a seekable-/non-seekable stream whose size is either
> known or unknown prior to writing them.
> One of them is prohibited (uncompressed/non-seekable/unknown size) and
> has been prohibited before, no change here.
> For six of the remaining seven permutations ZipArchiveOutputStream
> should be generating archives that transparently enable ZIP64 features
> for entries if and only if they are too big to be stored without ZIP64.
> I.e. the resulting archive will either be readable by an implementation
> that doesn't support ZIP64 or it contains files that would be too big
> for such an implementation anyway.  The price we pay for some cases are
> an additional 20 bytes per entry that are never used by anybody.
> The only case that isn't covered so far is compressed / non-seekable
> output / input of unknown size.
> Such entries are stored using a feature that is called the "data
> descriptor".  There are two different formats of the data descriptor for
> ZIP64 and not-ZIP64 archives and the archive writer has to signal which
> type of descriptor it is going to write before it starts writing the
> entry's data.
> This means ZipArchiveOutputStream must decide whether it is going to use
> the ZIP64 format before it knows whether it would actually need it or
> not.  If it signals it is going to use ZIP64 then an implementation that
> doesn't support ZIP64 (like Compress 1.2 or may fail to
> read the archive, which is bad if the entry turns out to be smaller than
> 4GiB.  If it doesn't signal ZIP64 it can't write big entries at all.
> This decision can be made at the granularity of a single entry.  I.e. it
> is possible to not use ZIP64 for the majority of entries and enable it
> for individual entries.
> IMHO there is no right or wrong decision here that the library could
> make.  The user-code will have to decide whether ZIP64 should be enabled
> or not.  The main questions to me are whether we want to attach this
> decision to the stream or the entry itself and what the default should
> be.

Can you think of practical use cases where setting at the entry
level is needed?
> InfoZIP's ZIP has decided to make it an option for the whole archive
> (the command line doesn't offer much flexibility here) and make it
> default to ZIP64.
> My current thinking is that is a likely candidate for the
> receiving end of ZIPs we create, so it may be better to turn ZIP64 off
> by default, but I'm not sure.
> I'm leaning towards adding a setUseZip64(boolean) method at the level of
> ZipArchiveOutputStream and make it default to false.  This method could
> be called in between putArchiveEntry calls to make it apply selectively
> to indiviual entries.

Sounds reasonable.
> The name is totally open for debate since as it stands it sounds as if
> you could turn off all Zip64 features which I wouldn't want to do for
> the cases that can be dealt with transparently.  Then again it could use
> a Boolean argument with "null" meaning "do the best you can" and false
> "don't even use Zip64 if you think it is safe".

I don't get what you mean by "do the best you can."  Does that mean
turn it on when needed if somehow you know it is needed, per entry,
I assume?

Libraries that try to be too smart tend to be hard on both users and
maintainers, so IIUC what is going on here, I would recommend KISS -
simple boolean property.  

> Any ideas?
> Stefan
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message