commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Bodewig <>
Subject Re: [compress] ZIP64: API imposed limits vs limits of the format
Date Fri, 05 Aug 2011 10:15:23 GMT
On 2011-08-04, Lasse Collin wrote:

> On 2011-08-04 Stefan Bodewig wrote:
>> There are a few places where our implementation doesn't allow for the
>> full range the ZIP format would support.  Some are easy to fix, some
>> hard and I'm asking for feedback whether you consider it worth the
>> effort to fix them at all.

> I guess that these are enough for the foreseeable future:

>     Max archive size:             Long.MAX_VALUE
>     Max size of individual entry: Long.MAX_VALUE
>     Max number of file entries:   Integer.MAX_VALUE

This is what we currently have in all three classes,
ZipArchiveInputStream can go beyond that in all three cases in theory
(for the individual entry we'd need to store sizes in ZipArchiveEntry as
BigInt but no other change would be required).

> Java APIs don't suppport bigger files and I guess that so big files
> won't be common even if file system sizes allowed them. If you write
> ten terabytes per second, it will still take well over a week to
> create an archive of 2^63-1 bytes.


> I don't know how much memory one file entry needs, but let's assume
> it takes only 50 bytes, including the overhead of the linked list
> etc. Keeping a list of 2^31-1 files will then need 100 GiB of RAM.

As the code currently stands a single entry is likely way bigger than 50
bytes.  Even if we tried to keep only the data that is needed for the
central directory and store that as raw bytes rather than Java objects a
single central directory entry would still need the file name plus at
least about twenty five bytes per entry.

This doesn't look as if it was worth implementing a linked list, I
should have calculated the total memory needed myself, I guess.

> Even if the number of files is limited to Integer.MAX_VALUE, it can be
> good to think about the memory usage of the data structures used for
> the file entries.

True, I'll think about what I currently store inside the linked list in
ZipArchiveOutputStream a bit more.  Inside of ZipFile it really doesn't
make much sense to store the data in any other way than ZipArchiveEntry
objects which are not really optimized for size (and inherit from
java.util.ZipEntry so out hands are pretty much tied).


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message