hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7206) Integrate Snappy compression
Date Wed, 22 Jun 2011 00:04:47 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052955#comment-13052955
] 

Todd Lipcon commented on HADOOP-7206:
-------------------------------------

Sorry, I stopped paying attention to this for a while... I have some concerns about the way
this ended up:

We're now pulling in a jar which autoexpands its .so dependency into /tmp and then loads native
libraries that way. That's (a) messy, (b) potentially insecure without workarounds to change
/tmp to some other dir, and (c) inconsistent with how native libraries work. These are the
same arguments Alejandro made above

This maven artifact that we now depend on is something that isn't easy to rebuild, and it's
not even clear how it gets build. For example, which glibc version is it linked against? Which
OSX version is the included dylib built on? Seems a little scary as a dependency

It seems the motivation to switch from the hadoop-snappy approach to the java-snappy approach
was that the former approach depended on having snappy.so available at runtime, which isn't
always the case. I would propose the following:
- at build time, you can choose (a) disable snappy, (b) enable snappy and dynamically link
our JNI shims against snappy.so, or (c) enable snappy and statically link against snappy.so
- those who don't care about snappy choose (a)
- those who care about snappy and plan to deploy on systems where libsnappy.so is deployed
system-wide (eg fedora or most recent ubuntu) can choose (b) to pick up the snappy lib off
the system
- those who care about snappy and plan to deploy elsewhere choose (c), and just make sure
that snappy is available at compile time

Then the hadoopsnappy.so can be included in lib/native just like our other native dependencies
without the unjar-to-tmp hackery.

Does this idea address everyone's goals?

> Integrate Snappy compression
> ----------------------------
>
>                 Key: HADOOP-7206
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7206
>             Project: Hadoop Common
>          Issue Type: New Feature
>    Affects Versions: 0.21.0
>            Reporter: Eli Collins
>            Assignee: T Jake Luciani
>             Fix For: 0.23.0
>
>         Attachments: HADOOP-7206-002.patch, HADOOP-7206.patch, v2-HADOOP-7206-snappy-codec-using-snappy-java.txt,
v3-HADOOP-7206-snappy-codec-using-snappy-java.txt, v4-HADOOP-7206-snappy-codec-using-snappy-java.txt,
v5-HADOOP-7206-snappy-codec-using-snappy-java.txt
>
>
> Google release Zippy as an open source (APLv2) project called Snappy (http://code.google.com/p/snappy).
This tracks integrating it into Hadoop.
> {quote}
> Snappy is a compression/decompression library. It does not aim for maximum compression,
or compatibility with any other compression library; instead, it aims for very high speeds
and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is
an order of magnitude faster for most inputs, but the resulting compressed files are anywhere
from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses
at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.
> {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message