commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Bodewig <bode...@apache.org>
Subject Re: [compress] Do we want 7z Archive*Stream-like classes
Date Sun, 06 Oct 2013 13:25:11 GMT
On 2013-10-01, Damjan Jovanovic wrote:

> On Tue, Oct 1, 2013 at 6:09 AM, Stefan Bodewig <bodewig@apache.org> wrote:

>> Reading may be simpler, here you can store the meta-information from the
>> start of the file in memory and then read entries as you go, ZipFile
>> inside the zip package does something like this.

> From what I remember:

> The "meta-information" can be anywhere in the file, as can the
> compressed files themselves. The 7zip tool seems to write the
> meta-information at the end of the 7z file when multi-file archives
> are created.

Oh yes, my understanding has been pretty much wrong and re-reading your
implementation has helped me to see clearer.  Right now I think the
important metadata actually is at the end but there is a smaller part at
the front - in particular a pointer to the Header holding the metadata.

> Compressed file codecs, positions, lengths, and solid compression
> details are only stored in the meta-information, so it's not possible
> to write a streaming reader without O(n) memory in the worst case.

I agree.

> Writing also requires seeking or O(n) memory, as the initial header at
> the beginning of the file contains the offset to the next header, and
> we only know the size/contents/location of the next header once all
> the files have been written.

or a temporary file to which the first header could be prepended - but
if you have that, you could use seeking as well.  So yes, I agree again.

> Since we now have multiple archivers that require seeking, I suggest
> we add a SeekableStream class or something along those lines. The
> Commons Imaging project also has the same problem to solve for images,
> and it uses ByteSources, which can be arrays, files, or an InputStream
> wrapper that caches what has been read (so seeking is efficient, while
> it only reads as much from the InputStream as is necessary).

Interesting idea.

Right now I'm willing to postpone and streaming API for 7z and rather
cut a release with a files only API.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message