hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7206) Integrate Snappy compression
Date Thu, 23 Jun 2011 01:30:47 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053590#comment-13053590

Scott Carey commented on HADOOP-7206:

bq. However, it has a serious drawback; the native code is not built in target OS, only on
the same architecture. Because of this the build is not easy reproducible as there is not
knowledge of the OS used to build it.

Sure it is reproducible.  snappy is used as an artifact, not built from source.  The build
is reproducible because it _always_ uses the same artifact, and always produces the same output.
 Is it a requirement to recompile all Java jars to be reproducible?

hadoop-snappy has another drawback/benefit pair:

Users may have snappy-java in their paths for their own use (for example via Avro, Hive, Hbase,
or user code).
Drawback: the library can't be shared, bloating the # of classes and jars
Benefit: the library won't have a version conflict
Unknown(to me): does a snappy-java binding conflict with a hadoop custom one if both are loaded
in the same JVM / Classloader?

I think the check for a system available libsnappy.so prior to loading the one in the jar
should go into the snappy-java project, then users can optionally compile one and make it
available to Hadoop, or use the one in the jar, and Hadoop has to maintain less code and build
infrastructure as a result. 

> Integrate Snappy compression
> ----------------------------
>                 Key: HADOOP-7206
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7206
>             Project: Hadoop Common
>          Issue Type: New Feature
>    Affects Versions: 0.21.0
>            Reporter: Eli Collins
>            Assignee: Alejandro Abdelnur
>             Fix For: 0.23.0
>         Attachments: HADOOP-7206-002.patch, HADOOP-7206.patch, v2-HADOOP-7206-snappy-codec-using-snappy-java.txt,
v3-HADOOP-7206-snappy-codec-using-snappy-java.txt, v4-HADOOP-7206-snappy-codec-using-snappy-java.txt,
> Google release Zippy as an open source (APLv2) project called Snappy (http://code.google.com/p/snappy).
This tracks integrating it into Hadoop.
> {quote}
> Snappy is a compression/decompression library. It does not aim for maximum compression,
or compatibility with any other compression library; instead, it aims for very high speeds
and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is
an order of magnitude faster for most inputs, but the resulting compressed files are anywhere
from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses
at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.
> {quote}

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message