hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taro L. Saito (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7206) Integrate Snappy compression
Date Fri, 24 Jun 2011 02:19:47 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054215#comment-13054215

Taro L. Saito commented on HADOOP-7206:

@Issei @Alejandro
Great. That means as long as using the same classloader (as Hadoop seems to do so), reusing
libsnappy.so between hadoop-snappy and snappy-java is no problem. Now, it looks like whether
to use libsnappy.so or not is a problem of snappy-java, and I prefer to use libsnappyjava.so
(statically linked snappy + JNI code with -fvisibility=hiden option), which can avoid potential
API conflict and missing library problems (for some OSes). 

In my experience of developing sqlite-jdbc (http://sqlite-jdbc.googlecode.com/), which uses
the same technique to extract .so file at runtime, many people seems to be satisfied with
this approach. The problem that can be solved by the runtime library extraction is failures
due to misconfiguration (e.g., LD_LIBRARY_PATH, etc.) and library build process (gcc, linker
options, etc.) for each OS. For example, I frequently use Windows to develop the code, but
run the production code under Linux; no need to switch the library files really helps me a
lot. But, looking at HADOOP-7405, current Hadoop's native libraries are not so portable across
various OSes. In such a state, motivation for using portable library something like snappy-java
might be low.

I don't care which one is used in Hadoop, but the discussion in this thread has been useful
for me to improve snappy-java. Thanks!

a) OS X (32-bit/64-bit) are already supported.  
b) I need to know os.name and os.arch name system properties that IBM JVM provides. 

Building and embedding non-bundled so file into snappy-java is simple; just do "make".  
As a matter of fact, I do it for 6 types of OS and CPU combinations. And also, by using VMWare
FUSION in Mac, all of native libraries currently supported can be compiled in a single machine.
Some 64-bit OS can be used to build 32-bit native libraries (e.g., Windows, Linux, etc.) 

> Integrate Snappy compression
> ----------------------------
>                 Key: HADOOP-7206
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7206
>             Project: Hadoop Common
>          Issue Type: New Feature
>    Affects Versions: 0.21.0
>            Reporter: Eli Collins
>            Assignee: Alejandro Abdelnur
>             Fix For: 0.23.0
>         Attachments: HADOOP-7206-002.patch, HADOOP-7206.patch, v2-HADOOP-7206-snappy-codec-using-snappy-java.txt,
v3-HADOOP-7206-snappy-codec-using-snappy-java.txt, v4-HADOOP-7206-snappy-codec-using-snappy-java.txt,
> Google release Zippy as an open source (APLv2) project called Snappy (http://code.google.com/p/snappy).
This tracks integrating it into Hadoop.
> {quote}
> Snappy is a compression/decompression library. It does not aim for maximum compression,
or compatibility with any other compression library; instead, it aims for very high speeds
and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is
an order of magnitude faster for most inputs, but the resulting compressed files are anywhere
from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses
at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.
> {quote}

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message