commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Torsten Curdt <tcu...@vafer.org>
Subject Re: [compress] ZIP64: API imposed limits vs limits of the format
Date Thu, 04 Aug 2011 15:03:56 GMT
> ZipFile relies on RandomAccessFile so any archive can't be bigger than
> the maximum size supported by RandomAccessFile.  In particular the seek
> method expects a long as argument so the hard limit would be an archive
> size of 2^63-1 bytes.  In practice I expect RandomAccessFile to not
> support files that big on many platforms.

Yeah ... let's cross that bridge when people complain ;)

> For the streaming mode offsets are currently stored as longs but that
> could be changed to BigIntegers easily so we could reach 2^64-1 at the
> expense of memory consumption and maybe even some performance issues
> (the offsets are not really used in calculations so I don't expect any
> major impact).

No insights on the implementation but that might be worth changing so
it's in line with the ZipFile impl

> Size of an individual entry (compressed or not)
> ===============================================
>
> The format supports an unsigned 64 bit integer as size, ArchiveEntry's
> get/setSize methods use long - this means there is a factor of 2.
>
> We could easily add an additional setter/getter for size that uses
> BigInteger, the infrastructure to support it would be there.  OTOH it is
> questionable whether we'd support anything > Long.MAX_VALUE in practice
> because of the previous point anyway.

Especially as this also just for one individual entry. Again - I think
I would not bother at this stage.
Nothing that cannot be added later.

> Number of files entries the archive
> ===================================
>
> This used to be an unsingned 16 bit integer and has grown to an
> unsigned 64 bit integer with ZIP64.
>
> ZipArchiveInputStream should work with arbitrary many entries.
>
> ZipArchiveOutputStream uses a LinkedList to store all entries as it has
> to keep track of the metadata in order to write the central directory.
> It also uses an additional HashMap that could be removed easily by
> storing the data together with the entries themselves.  LinkedList won't
> allow more than Integer.MAX_VALUE entries which leaves us quite a bit
> away from the theoretical limit of the format.

Hmmm.

> I'm confident that even I would manage to write an efficient singly
> linked list that is only ever appended to and that is iterated over
> exactly once from head to tail.

+1 for that then :)

> I don't see myself writing an efficient map
> with a capacity of Long.MAX_VALUE or bigger, either.

There must be something like that out there already.
Otherwise it could be another nice addition to Collections ;)

> We could stick with documenting the limits of ZipFile properly.  In
> practice I doubt many people will have to deal with archives of 2^63
> bytes or more.  And even archives with 2^32 entries or more should be
> rare - in which case people could fall back to ZipArchiveInputStream.

Hm. Yeah ...maybe just get it out before we start implementing new
collection classes.

Cool stuff!!

cheers,
Torsten

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message