hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicholas Carlini (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-6837) Support for LZMA compression
Date Fri, 06 Aug 2010 21:49:19 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nicholas Carlini updated HADOOP-6837:
-------------------------------------

    Attachment: HADOOP-6837-lzma-2-20100806.patch

I found a bug where Java compression would set a very, very wrong dictionary size. Instead
of setting the dictionary size to, say, 2^24, it would set it to 24 (which would then be forced
up to the min size, but still, very very wrong).

I also added a fairly long package.html (~3000 words) with documentation about how what I
did works, so anyone else who wants to modify it hopefully won't need to spend forever exploring
the code again figuring out how it works.

And also, I was both right and wrong about giveMore(). I was right when I first wrote it,
and wrong when I said I fixed it by using the return value. The return values were actually
left over from when I was checking for the end of stream in the Java end, but I realized that
it was possible (because of the semi-circular buffer) for Java to indicate an EOF but for
it not to be really true. So I had moved that check to the C code and just never removed that
code from that Java end.

Fixed the linked list stuff.

Also a fairly significant directory restructure. The modified SDK code now is in src/contrib/SevenZip.
Java code under src/java and C code is under src/native. I removed all of my re-formatting
of their code so should a future version of the SDK be released, it shouldn't be as hard to
do a diff and apply the patch to make this code better. The makefile from there builds to
the same build tree as otherwise for java.

In order to get it building correctly, I had to modify the base build.xml and the contrib/build.xml.
compile-core-classes now also depends on compile-contrib-before. compile-contrib-before now
calls compile-before on contrib/build.xml. From there contrib/build.xml calls compile-before
on contrib/*/build.xml, with failonerror set to false so this change will not break any other
build scripts. (This change is required because Lzma{Input,Output}Stream stream requires the
classes to be built first.)

I also cleaned up the code and fixed all the review comments.

There will be at least one more version of this patch for things I didn't catch.

> Support for LZMA compression
> ----------------------------
>
>                 Key: HADOOP-6837
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6837
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: io
>            Reporter: Nicholas Carlini
>            Assignee: Nicholas Carlini
>         Attachments: HADOOP-6837-lzma-1-20100722.non-trivial.pseudo-patch, HADOOP-6837-lzma-1-20100722.patch,
HADOOP-6837-lzma-2-20100806.patch, HADOOP-6837-lzma-c-20100719.patch, HADOOP-6837-lzma-java-20100623.patch
>
>
> Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves
higher compression ratios than both gzip and bzip2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message