commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simone Tripodi <simonetrip...@apache.org>
Subject Re: [compress] XZ support and inconsistencies in the existing compressors
Date Wed, 03 Aug 2011 19:59:14 GMT
Hi Lasse!
I'd personally like if you could fill an Issue on Jira and submit your
XZ implementation as a patch that naturally fits in the
org.apache.commons.compress package and you continue contributing on
maintaining it - maybe depending on an external package would be more
difficult since commons components generally are self contained and
don't depend from any part library - unless are commons components
themselves.

Keep what I said strictly as a personal suggestion, I'm not involved
in [compress] development so I let maintainers taking decisions.

Have a nice day, all the best!
Simo

http://people.apache.org/~simonetripodi/
http://www.99soft.org/



On Wed, Aug 3, 2011 at 9:22 PM, Lasse Collin <lasse.collin@tukaani.org> wrote:
> Hi!
>
> I have been working on XZ data compression implementation in Java
> <http://tukaani.org/xz/java.html>. I was told that it could be nice
> to get XZ support into Commons Compress.
>
> I looked at the APIs and code in Commons Compress to see how XZ
> support could be added. I was especially looking for details where
> one would need to be careful to make different compressors behave
> consistently compared to each other. I found a few possible problems
> in the existing code:
>
> (1) CompressorOutputStream should have finish(). Now
>    BZip2CompressorOutputStream has finish() but
>    GzipCompressorOutputStream doesn't. This should be easy to
>    fix because java.util.zip.GZIPOutputStream supports finish().
>
> (2) BZip2CompressorOutputStream.flush() calls out.flush() but it
>    doesn't flush data buffered by BZip2CompressorOutputStream.
>    Thus not all data written to the Bzip2 stream will be available
>    in the underlying output stream after flushing. This kind of
>    flush() implementation doesn't seem very useful.
>
>    GzipCompressorOutputStream.flush() is the default version
>    from InputStream and thus does nothing. Adding flush()
>    into GzipCompressorOutputStream is hard because
>    java.util.zip.GZIPOutputStream and java.util.zip.Deflater don't
>    support sync flushing before Java 7. To get Gzip flushing in
>    older Java versions one might need a complete reimplementation
>    of the Deflate algorithm which isn't necessarily practical.
>
> (3) BZip2CompressorOutputStream has finalize() that finishes a stream
>    that hasn't been explicitly finished or closed. This doesn't seem
>    useful. GzipCompressorOutputStream doesn't have an equivalent
>    finalize().
>
> (4) The decompressor streams don't support concatenated .gz and .bz2
>    files. This can be OK when compressed data is used inside another
>    file format or protocol, but with regular (standalone) .gz and
>    .bz2 files it is bad to stop after the first compressed stream
>    and silently ignore the remaining compressed data.
>
>    Fixing this in BZip2CompressorInputStream should be relatively
>    easy because it stops right after the last byte of the compressed
>    stream. Fixing GzipCompressorInputStream is harder because the
>    problem is inherited from java.util.zip.GZIPInputStream
>    which reads input past the end of the first stream. One
>    might need to reimplement .gz container support on top of
>    java.util.zip.InflaterInputStream or java.util.zip.Inflater.
>
> The XZ compressor supports finish() and flush(). The XZ decompressor
> supports concatenated .xz files, but there is also a single-stream
> version that behaves similarly to the current version of
> BZip2CompressorInputStream.
>
> Assuming that there will be some interest in adding XZ support into
> Commons Compress, is it OK make Commons Compress depend on the XZ
> package org.tukaani.xz, or should the XZ code be modified so that
> it could be included as an internal part in Commons Compress? I
> would prefer depending on org.tukaani.xz because then there is
> just one code base to keep up to date.
>
> --
> Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message