commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Bodewig <>
Subject Re: [compress] XZ support and inconsistencies in the existing compressors
Date Sat, 06 Aug 2011 04:53:58 GMT

>>>> Is this <>?

On 2011-08-04, Lasse Collin wrote:

> On 2011-08-04 Stefan Bodewig wrote:
>> On 2011-08-04, Lasse Collin wrote:

>>> Yes. I didn't check the suggested fix though.

>> Would be nice if you'd find the time to do so.

> It uses in.available() == 0. It duplicates the test for "BZh" magic
> bytes and a little more from init() into complete(). I think this bug
> can be fixed in a nicer way.

Patches welcome ;-)

> Is there a need to have a bzip2 decompressor that does stop after the
> first stream (like the current code does)? Maybe .zip needs it?

Currently .zip doesn't use bzip2 at all and I don't think it will do so
before 2.x as I'd like to rework the API so that people could add their
own compression/encryption algos.  In JIRA there is at least one entry
where somebody has a company-owned implemtentation of one of the
compression algos (can't recall the details) and would like to hook that
into ZIP.

I see us defining a more genric Encoder/Decoder API, maybe similar to and use that inside ZIP, basing a BZIP2
implementation on the current codebase.

>> We'll need standalone compressors for other formats as well (and we do
>> need LZMA 8-).  Some of the options your code provides might be
>> interesting for the ZIP package as well when we want to implement some
>> of the other supported methods.

> The .lzma format is legacy. While it may have some uses, people should
> usually move to .xz and LZMA2.

But they may already have to deal with .lzma files because they exist,
or because a process exists that requires them to write .lzmas.  I just
read that .tar.lzma could be used inside Debian packages.

> The .zip format has LZMA marked as "Early Feature Specification". Minor
> details are a little bit weird. For example, it requires storing the
> LZMA SDK version that was used for compression (what if you don't use
> unmodified LZMA SDK).

A lot of things inside the ZIP spec are "a little bit weird".  The
problem I had with Java7's interpretation of the APPNOTE when it comes
to data descriptors is such a symptom.  It turned out there is a comment
stretching more than 70 lines in InfoZIP's code explaining their
interpretation and why they do so.  Well worth reading (zipfile.c in
zip30's source code, lines 5527ff, sorry no public source code repo I
could point to).  It contains the lines

    /* This is rather klugy as the AppNote handles this poorly.
       This was the old thought:
       After discussions with other groups this is the current thinking:
       Apparent industry interpretation for data descriptors:

> What else needs LZMA? Do you plan .7z support?

Eventually.  There is a feature request for it.  It would probably best
to "simply" base it on the public domain 7Zip SDK.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message