commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Bodewig <bode...@apache.org>
Subject Re: [compress] Decompressing bzip2 binary produced by Python bz2?
Date Sun, 25 May 2014 16:50:54 GMT
On 2014-05-25, He Shiming wrote:

> According to
> http://commons.apache.org/proper/commons-compress/apidocs/src-html/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.html,
> the exception is thrown while detecting a 'bz2' file header of 'BZh' +
> '1'. On top of that, there appears to be other segment headers it
> require.

The 1 is the block size (in units of 100kB) and can be any number
between 1 and 9.  This information is crucial for BZip2 to work
properly.  Since this format compresses the whole block, the minimum
amount of data you can decompress is such a block including all its
metadata like huffman tables used, it is impossible to start
decompression in the midle of such a block.

In addition Compress' API won't allow you to start decompressing
anywhere else but at the very start of the file.  It wouldn't be too
hard to add a different mode to BZip2CompressorInputStream which would
at least need to know the size of the buffer and could start working on
a full compressed buffer - but it is not possible without modifying the
class itself.

Basically you'd need to a add a new constructor accepting a stream and
the block size as arguments, manually set a few member variables that
otherwise would get set in init and proceed to initBlock immediately.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Mime
View raw message