hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Roelofs (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6837) Support for LZMA compression
Date Sat, 07 Aug 2010 03:25:19 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896213#action_12896213
] 

Greg Roelofs commented on HADOOP-6837:
--------------------------------------

20100806 version looks pretty good.  The first couple of issues are the main ones:

 * FakeOutputStream LinkedList of 1 KB buffers (BLOCKSIZE = 1024):  supposed to be 128 KB
buffers (package.html)
 * directory structure = unholy mix of 7zip, Hadoop  (expected directory structure similar
to LZMA SDK tarball contents, e.g.,  lzma912/Java/SevenZip/Compression/LZMA/Decoder.java and
lzma912/C/LzFind.c)
   ** OK to trim some of tarball's levels out (and to be different for Java, C), and top level
need not be "lzma912" (though it makes connection to download more obvious)--I know I suggested
"SevenZip", but I thought that appeared in both the Java and the C paths:  suggest lzma912
or lzma-9.12 instead
 * kDefaultDictionaryLogSize no longer changed to 16:  OK?
 * apparently bogus files:
   ** src/contrib/SevenZip/ivy/libraries.properties
   ** src/contrib/ec2/bin/hadoop-ec2-env.sh
 * "LZMA SDK" -> "LZMA SDK 9.12" in all boilerplate
 * CRC:  prefer to reuse existing (e.g., PureJavaCrc32); should be compatible
 * LzmaNativeInputStream.java:
   ** "circularwould"
   ** read():  fast busy-loop not thread-friendly...and not necessary:  read(b[]) (InputStream)
blocks until at least 1 byte available--zero returned only if b[] has length zero, which is
not true of oneByte[]
   ** read():  t unnecessary; just do:  return (int)oneByte[0] & 0xFF;
     or even:  return (ret == -1)? -1 : (int)oneByte[0] & 0xFF;
   ** 1<<13
 * LzmaNativeOutputStream.java:
   ** buffered, sendData -> compressedDirectBuf, uncompressedDirectBuf as above
   ** 1<<16
 * LzmaOutputStream.java:
   ** 1<<17
 * Makefile.am
   ** "All these are setup" -> "All these are set up"
   ** "are also done" -> "is also done"

(Per previous feedback, please change all "1<<N" to the Java equivalent of static const
int kSomeBufferSize = M * 1024:  easier to read, easier to change later.)

Btw, it's always best to follow the existing style consistently than to use your own for your
changes (with the possible exception of the boilerplate comment).  Perhaps Emacs hides the
issue, but with 8-space tabs, your changes to the contrib LZMA files are a complete mismatch
to their style.

Be sure to run "ant javadoc" (and fix any new issues) before the next patch, and give "ant
test" a shot, too (over the weekend if you happen to see this--it takes several hours to run).
 I'll work with you to get "ant test-patch" going.


> Support for LZMA compression
> ----------------------------
>
>                 Key: HADOOP-6837
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6837
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: io
>            Reporter: Nicholas Carlini
>            Assignee: Nicholas Carlini
>         Attachments: HADOOP-6837-lzma-1-20100722.non-trivial.pseudo-patch, HADOOP-6837-lzma-1-20100722.patch,
HADOOP-6837-lzma-2-20100806.patch, HADOOP-6837-lzma-c-20100719.patch, HADOOP-6837-lzma-java-20100623.patch
>
>
> Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves
higher compression ratios than both gzip and bzip2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message