commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Damjan Jovanovic (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COMPRESS-111) support for lzma files
Date Tue, 07 May 2013 21:03:16 GMT

    [ https://issues.apache.org/jira/browse/COMPRESS-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13651308#comment-13651308
] 

Damjan Jovanovic commented on COMPRESS-111:
-------------------------------------------

The fundamental problem is that Commons Compress does decompression via CompressorInputStream’s
read() methods, which are a pull-model interface, while the LZMA SDK (in the public domain)
does it with Decoder.code(), a method that takes a compressed input stream and an output stream
to decompress to, then reads, decompresses, and writes, only returning when the entire file
is decompressed. There is no way to convert this to a pull-model CompressorInputStream: either
you have to pull in one thread while pushing from another, or push everything into a ByteArrayInputStream
(which needs O\(n) memory!!) and then pull from that afterwards. Both are really ugly solutions:
thread per stream is heavy and creating new threads is not allowed in some environments (eg.
unsigned Applets and Java EE servers), while trying to allocate O\(n) memory can OutOfMemoryError
the entire JVM.

The Java LZMA attempts out there rate as follows:

Maurel’s patch here uses O\(n) memory, and decompresses the entire stream in the constructor
and stores it in a ByteArrayInputStream which is then copied from on each read().

http://jponge.github.io/lzma-java/ is licensed ASLv2 and states how it solved the push/pull
problem: “Although not a derivate work, the streaming api classes were inspired from the
work of Christopher League. I reused his technique of fake streams and working threads to
pass the data around between encoders/decoders and "normal" Java streams.” In other words,
it pushes in one thread and pulls in another. Actual decompression in the other thread is
still done with the LZMA SDK, which it just wraps into an InputStream subclass.

http://contrapunctus.net/league/haques/lzmajio/ was done by Christopher League, it’s under
“LGPL or the Common Public License” and has the same push in one thread pull in another
story. It’s also just a wrapper of the LZMA SDK.

http://tukaani.org/xz/java.html is in the public domain and is already used by Commons Compress
to provide XZ compression support. It supports XZ and LZMA2 only and supports them well -
proper pull-model InputStream with no O\(n) memory or background threads. LZMA2 is a different
file format from LZMA. But then again LZMA2 uses LZMA internally. I’ll have to investigate
in detail.
                
> support for lzma files
> ----------------------
>
>                 Key: COMPRESS-111
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-111
>             Project: Commons Compress
>          Issue Type: New Feature
>          Components: Compressors
>    Affects Versions: 1.0
>            Reporter: maurel jean francois
>         Attachments: compress-trunk-lzmaRev0.patch, compress-trunk-lzmaRev1.patch
>
>
> adding support for compressing and decompressing of files with LZMA algoritm (Lempel-Ziv-Markov
chain-Algorithm)
> (see http://markmail.org/search/?q=list%3Aorg.apache.commons.users/#query:list%3Aorg.apache.commons.users%2F+page:1+mid:syn4uuvbzusevtko+state:results)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message