commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Damjan Jovanovic (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (COMPRESS-111) support for lzma files
Date Wed, 08 May 2013 07:43:18 GMT

    [ https://issues.apache.org/jira/browse/COMPRESS-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13651683#comment-13651683
] 

Damjan Jovanovic edited comment on COMPRESS-111 at 5/8/13 7:42 AM:
-------------------------------------------------------------------

Compared to the normal way of extracting a file from an archive (read->decompress->write),
the temp-file solution requires read->decompress->write-temp->read-temp->write,
increasing I/O time proportionally to the size of the decompressed file (ie. at least doubling
it), which is why I didn't even consider it.

It seems like LZMA2 breaks up the stream to be compressed into blocks, and can (de)compress
the blocks independently of each other (which has the benefit of allowing fast, multi-threaded
decompression). In Lasse's code, LZMA2InputStream uses O\(n) memory per block in the method
RangeDecoder.prepareInputBuffer() called from LZMA2InputStream.decodeChunkHeader(). For LZMA
however, the "block" is the entire file. Luckily it seems pretty easy to patch RangeDecoder
to read incrementally. LZMA2InputStream probably has to also be modified, as I don't think
LZMA has a chunk header. I don't know what else may be necessary.

Oh and even if LZMA is a legacy format, we still need it for reading .7z files, which always
use LZMA for header compression (which is enabled by default).

                
      was (Author: damjan):
    Compared to the normal way of extracting a file from an archive (read->decompress->write),
the temp-file solution requires read->decompress->write-temp->read-temp->write,
increasing I/O time proportionally to the size of the decompressed file (ie. at least doubling
it), which is why I didn't even consider it.

It seems like LZMA2 breaks up the stream to be compressed into blocks, and can (de)compress
the blocks independently of each other (which has the benefit of allowing fast, multi-threaded
decompression). In Lasse's code, LZMA2InputStream uses O(n) memory per block in the method
RangeDecoder.prepareInputBuffer() called from LZMA2InputStream.decodeChunkHeader(). For LZMA
however, the "block" is the entire file. Luckily it seems pretty easy to patch RangeDecoder
to read incrementally. LZMA2InputStream probably has to also be modified, as I don't think
LZMA has a chunk header. I don't know what else may be necessary.

Oh and even if LZMA is a legacy format, we still need it for reading .7z files, which always
use LZMA for header compression (which is enabled by default).

                  
> support for lzma files
> ----------------------
>
>                 Key: COMPRESS-111
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-111
>             Project: Commons Compress
>          Issue Type: New Feature
>          Components: Compressors
>    Affects Versions: 1.0
>            Reporter: maurel jean francois
>         Attachments: compress-trunk-lzmaRev0.patch, compress-trunk-lzmaRev1.patch
>
>
> adding support for compressing and decompressing of files with LZMA algoritm (Lempel-Ziv-Markov
chain-Algorithm)
> (see http://markmail.org/search/?q=list%3Aorg.apache.commons.users/#query:list%3Aorg.apache.commons.users%2F+page:1+mid:syn4uuvbzusevtko+state:results)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message