hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] [Reopened] (HADOOP-7206) Integrate Snappy compression
Date Thu, 23 Jun 2011 01:02:49 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alejandro Abdelnur reopened HADOOP-7206:
----------------------------------------


After mulling over this issue a bit more, reading a few times Todd's comment and asking around
to folks that deal with nativelibs I'm having second thoughts about the committed patch based
on snappy-java.

The snappy-java approach is tempting because it 'just works' (without having to install snappy
SO in your system). However, it has a serious drawback; the native code is not built in target
OS, only on the same architecture. Because of this the build is not easy reproducible as there
is not knowledge of the OS used to build it. In addition, this can lead to not avail dependencies
in the running OS.

The hadoop-snappy approach has the drawback that it requires an additional step (to install
snappy SO in the platform), but as benefits it takes care of the drawbacks of the snappy-java
approach; the native code is built in the target OS. Thus, resulting on easy reproducible
builds. Furthermore the drawback is transient, until snappy is avail the different OSes by
default or OS driven updates.

A secondary issue is that snappy-java nativelib statically links snappy. As snappy SO makes
it to standard Linux distributions, snappy-java will use a private copy of it instead using
the one installed in the OS. On the other hand, hadoop-snappy SO dynamically links snappy
SO, when snappy SO is available in the OS, it could be consumed directly from it. (this could
be taken care by snappy-java if it changes to dynamically link snappy SO).

Because of this I'd like to revert the snappy-java based patch and go for Issay's hadoop-snappy
patch.

> Integrate Snappy compression
> ----------------------------
>
>                 Key: HADOOP-7206
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7206
>             Project: Hadoop Common
>          Issue Type: New Feature
>    Affects Versions: 0.21.0
>            Reporter: Eli Collins
>            Assignee: Alejandro Abdelnur
>             Fix For: 0.23.0
>
>         Attachments: HADOOP-7206-002.patch, HADOOP-7206.patch, v2-HADOOP-7206-snappy-codec-using-snappy-java.txt,
v3-HADOOP-7206-snappy-codec-using-snappy-java.txt, v4-HADOOP-7206-snappy-codec-using-snappy-java.txt,
v5-HADOOP-7206-snappy-codec-using-snappy-java.txt
>
>
> Google release Zippy as an open source (APLv2) project called Snappy (http://code.google.com/p/snappy).
This tracks integrating it into Hadoop.
> {quote}
> Snappy is a compression/decompression library. It does not aim for maximum compression,
or compatibility with any other compression library; instead, it aims for very high speeds
and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is
an order of magnitude faster for most inputs, but the resulting compressed files are anywhere
from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses
at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.
> {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message