commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dam6923 ." <dam6...@gmail.com>
Subject Re: [compress] Do we want 7z Archive*Stream-like classes
Date Sun, 06 Oct 2013 17:02:16 GMT
> Since we now have multiple archivers that require seeking, I suggest
> we add a SeekableStream class or something along those lines. The
> Commons Imaging project also has the same problem to solve for images,
> and it uses ByteSources, which can be arrays, files, or an InputStream
> wrapper that caches what has been read (so seeking is efficient, while
> it only reads as much from the InputStream as is necessary).

I would also like to advocate for this approach.  I was looking into
writing up an implementation of Google SNAPPY decompressor, but was
unable to effectively wrap it into an InputStream.  Having a seekable
stream would make my efforts a better fit for this library.

On Sun, Oct 6, 2013 at 9:25 AM, Stefan Bodewig <bodewig@apache.org> wrote:
> On 2013-10-01, Damjan Jovanovic wrote:
>
>> On Tue, Oct 1, 2013 at 6:09 AM, Stefan Bodewig <bodewig@apache.org> wrote:
>
>>> Reading may be simpler, here you can store the meta-information from the
>>> start of the file in memory and then read entries as you go, ZipFile
>>> inside the zip package does something like this.
>
>> From what I remember:
>
>> The "meta-information" can be anywhere in the file, as can the
>> compressed files themselves. The 7zip tool seems to write the
>> meta-information at the end of the 7z file when multi-file archives
>> are created.
>
> Oh yes, my understanding has been pretty much wrong and re-reading your
> implementation has helped me to see clearer.  Right now I think the
> important metadata actually is at the end but there is a smaller part at
> the front - in particular a pointer to the Header holding the metadata.
>
>> Compressed file codecs, positions, lengths, and solid compression
>> details are only stored in the meta-information, so it's not possible
>> to write a streaming reader without O(n) memory in the worst case.
>
> I agree.
>
>> Writing also requires seeking or O(n) memory, as the initial header at
>> the beginning of the file contains the offset to the next header, and
>> we only know the size/contents/location of the next header once all
>> the files have been written.
>
> or a temporary file to which the first header could be prepended - but
> if you have that, you could use seeking as well.  So yes, I agree again.
>
>> Since we now have multiple archivers that require seeking, I suggest
>> we add a SeekableStream class or something along those lines. The
>> Commons Imaging project also has the same problem to solve for images,
>> and it uses ByteSources, which can be arrays, files, or an InputStream
>> wrapper that caches what has been read (so seeking is efficient, while
>> it only reads as much from the InputStream as is necessary).
>
> Interesting idea.
>
> Right now I'm willing to postpone and streaming API for 7z and rather
> cut a release with a files only API.
>
> Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message