commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Bodewig <>
Subject Re: [compress] Decompressing bzip2 binary produced by Python bz2?
Date Sun, 25 May 2014 16:50:54 GMT
On 2014-05-25, He Shiming wrote:

> According to
> the exception is thrown while detecting a 'bz2' file header of 'BZh' +
> '1'. On top of that, there appears to be other segment headers it
> require.

The 1 is the block size (in units of 100kB) and can be any number
between 1 and 9.  This information is crucial for BZip2 to work
properly.  Since this format compresses the whole block, the minimum
amount of data you can decompress is such a block including all its
metadata like huffman tables used, it is impossible to start
decompression in the midle of such a block.

In addition Compress' API won't allow you to start decompressing
anywhere else but at the very start of the file.  It wouldn't be too
hard to add a different mode to BZip2CompressorInputStream which would
at least need to know the size of the buffer and could start working on
a full compressed buffer - but it is not possible without modifying the
class itself.

Basically you'd need to a add a new constructor accepting a stream and
the block size as arguments, manually set a few member variables that
otherwise would get set in init and proceed to initBlock immediately.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message