hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "UsingLzoCompression" by RyanRawson
Date Wed, 06 May 2009 20:17:23 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by RyanRawson:

New page:
== Warning ==

This doc only applies to 0.20.  If you are under 0.19.x, please consider upgrading.

== Why comprssion? ==

By enabling compression, the store file (HFile) will use a compression algorithm on blocks
as they are written (during flushes and compactions) and thus must be decompressed when reading.

Since this adds a read-time-penalty, why would one enable any compression?  There are a few
reasons why the advantages of compression can outweigh the disadvantages:
* Compression reduces the number of bytes written to/read from HDFS
* Compression effectively improves the efficiency of network bandwidth and disk space
* Compression reduces the size of data needed to be read when issuing a read

To be as low friction as necessary, a real-time compression library is preferred.  Out of
the box, HBase ships with only Gzip compression, which is fairly slow. 

To achieve maximal performance and benefit, you must enable LZO.

== Enabling Lzo compression in HBase ==

Lzo is a GPL'ed native-library that ships with most Linux distributions.  However, to use
it in HBase, one must do the following steps:

Ensure the native Lzo base library is available on every node:
* on Ubuntu: apt-get install liblzo2-dev
* or Download and build [http://www.oberhumer.com/opensource/lzo/]

Download/patch the native connector library:
* Download/checkout: [http://code.google.com/p/hadoop-gpl-compression/]
* Apply the patch attached to this issue: [http://code.google.com/p/hadoop-gpl-compression/issues/detail?id=6]
* On Linux you may need to apply the patch: [http://code.google.com/p/hadoop-gpl-compression/issues/detail?id=5]
* On Mac you may be interested in: [http://code.google.com/p/hadoop-gpl-compression/issues/detail?id=7]
** Also you will probably have to add the line to build.xml just above the call to 'configure'
in compile-native:
        <env key="CFLAGS" value="-arch x86_64" />

Build the native connector library:
* ant compile-native
* ant jar

Now you have the following results:

You might have Linux-i386-32 or Mac_OS_X-x86_64-64 or whatever platform you are actually using.

Copy the results into the hbase lib directory:
* build/hadoop-gpl-compression-0.1.0-dev.jar -> hbase/lib/
* build/native/Linux-amd64-64/lib/libgplcompression.* -> hbase/lib/native/Linux-amd-amd64-64/

Note there is an extra 'lib' level in the build, which is not present in the hbase/lib/native/

== Using Lzo ==

While creating tables in hbase shell, specify the per-column family compression flag:
 create 'mytable', {NAME=>'colfam:', COMPRESSION=>'lzo'}

That's it!

View raw message