hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taro L. Saito (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7206) Integrate Snappy compression
Date Wed, 22 Jun 2011 03:22:47 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053019#comment-13053019
] 

Taro L. Saito commented on HADOOP-7206:
---------------------------------------

Let me clarify some differences between Issay's hadoop-snappy and my snappy-java:

hadoop-snappy
 * Uses libsnappy.so (available in recent Linux distributions) and libhadoopsnappy.so (JNI
code compiled for the target platform)

snappy-java
 * Uses libsnappyjava.so (mixing up the original snappy and JNI code), or snappyjava.dll (for
Windows), libsnappyjava.jnilib (for Mac OS X)
 * It copies one of the native library to the directory specified in org.xerial.snappy.tempdir
or java.io.tempdir system property.
 * If the dependencies to the glibc (in Linux GLIBC2.3 or higher is required for now) and
dylib (in Mac OS X) cause some problems, you can re-compile snappy-java's native library only
for your own platform (with make clean-native native). No need to care about building native
libraries for the other platforms if you never use them. 

The same thing between hadoop-snappy and snappy-java is:
 * Both approaches need to compile the native code (libhadoopsnappy.so or libsnappyjava.so)
somewhere. My snappy-java simply provides pre-compiled libsnappyjava.so for various platforms.

One of the design goals of snappy-java is to avoid troubles in linking against native libraries
(e.g., libsnappy.so), such as crashes due to libstdc++ compatibility, missing libraries, etc.
But as Alejandro suggested in my discussion group, using separate libsnappy.so and libsnapyjava.so
is technically possible even in snappy-java:
 * First, tries to load pre-installed libsnappy.so and libsnappyjava.so (the version not containing
libsnappy.so)  
 * If not found, extract these libraries embedded in the JAR to somewhere.
 * Load both native libraries.  

I am not sure supporting such loading mechanism is a right way to go.


> Integrate Snappy compression
> ----------------------------
>
>                 Key: HADOOP-7206
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7206
>             Project: Hadoop Common
>          Issue Type: New Feature
>    Affects Versions: 0.21.0
>            Reporter: Eli Collins
>            Assignee: T Jake Luciani
>             Fix For: 0.23.0
>
>         Attachments: HADOOP-7206-002.patch, HADOOP-7206.patch, v2-HADOOP-7206-snappy-codec-using-snappy-java.txt,
v3-HADOOP-7206-snappy-codec-using-snappy-java.txt, v4-HADOOP-7206-snappy-codec-using-snappy-java.txt,
v5-HADOOP-7206-snappy-codec-using-snappy-java.txt
>
>
> Google release Zippy as an open source (APLv2) project called Snappy (http://code.google.com/p/snappy).
This tracks integrating it into Hadoop.
> {quote}
> Snappy is a compression/decompression library. It does not aim for maximum compression,
or compatibility with any other compression library; instead, it aims for very high speeds
and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is
an order of magnitude faster for most inputs, but the resulting compressed files are anywhere
from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses
at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.
> {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message