hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "UsingLzoCompression" by TedYu
Date Fri, 09 Jul 2010 04:50:58 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "UsingLzoCompression" page has been changed by TedYu.
http://wiki.apache.org/hadoop/UsingLzoCompression?action=diff&rev1=20&rev2=21

--------------------------------------------------

  == Warning ==
- 
  This doc only applies to 0.20 and beyond.  If you are under 0.19.x, please consider upgrading.
  
+ This distro doesn't contain all bug fixes (such as when LZO header or block header data
falls on read boundary).
+ 
+ Please get latest from http://github.com/kevinweil/hadoop-lzo
+ 
  == Why compression? ==
- 
  By enabling compression, the store file (HFile) will use a compression algorithm on blocks
as they are written (during flushes and compactions) and thus must be decompressed when reading.
  
  Since this adds a read-time-penalty, why would one enable any compression?  There are a
few reasons why the advantages of compression can outweigh the disadvantages:
+ 
   * Compression reduces the number of bytes written to/read from HDFS
   * Compression effectively improves the efficiency of network bandwidth and disk space
   * Compression reduces the size of data needed to be read when issuing a read
  
- To be as low friction as necessary, a real-time compression library is preferred.  Out of
the box, HBase ships with only Gzip compression, which is fairly slow. 
+ To be as low friction as necessary, a real-time compression library is preferred.  Out of
the box, HBase ships with only Gzip compression, which is fairly slow.
  
  To achieve maximal performance and benefit, you must enable LZO.
  
  == Enabling Lzo compression in HBase ==
- 
  Lzo is a GPL'ed native-library that ships with most Linux distributions.  However, to use
it in HBase, one must do the following steps:
  
  Ensure the native Lzo base library is available on every node:
+ 
   * on Ubuntu: apt-get install liblzo2-dev
-  * or Download and build [[http://www.oberhumer.com/opensource/lzo/]]
+  * or Download and build http://www.oberhumer.com/opensource/lzo/
  
  Checkout the native connector library:
+ 
-  * The project is [[http://code.google.com/p/hadoop-gpl-compression/]] 
+  * The project is http://code.google.com/p/hadoop-gpl-compression/
   * For 0.20.2 checkout branches/branch-0.1
   * For 0.21 or 0.22 checkout trunk
  
  On Mac:
+ 
   * To install the hadoop-gpl-compression library on a mac, it is advisable to use MacPorts.
To do so you must do the following:
+ 
  (Parts of this found on http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ )
  
  {{{
  > port fetch lzo2 # If for some reason LZO2 is already installed, please uninstall first
before doing this
  > port edit lzo2 # A vim editor should open
  
- // Add the following block of text in the file and save the file. 
+ // Add the following block of text in the file and save the file.
  variant x86_64 description "Build the 64-bit." {
      configure.args-delete     --build=x86-apple-darwin ABI=standard
      configure.cflags-delete   -m32
@@ -53, +59 @@

  
  > port install lzo2 +x86_64
  }}}
+ This ensures the library is built in 64 bit mode, because java 1.6 is 64 bit only.  Also
to make sure your lzo library is x64_64 as well, type:
  
- This ensures the library is built in 64 bit mode, because java 1.6 is 64 bit only.  Also
to make sure your lzo library is x64_64 as well, type: {{{
+ {{{
  $ file /usr/local/lib/liblzo2.2.0.0.dylib
  /usr/local/lib/liblzo2.2.0.0.dylib: Mach-O 64-bit dynamically linked shared library x86_64
  }}}
+  * On Mac you might want to use a command line, in the hadoop-gpl-compression home directory:
  
- 
-  * On Mac you might want to use a command line, in the hadoop-gpl-compression home directory:
  {{{
  env JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/ \
  C_INCLUDE_PATH=/path/to/lzo64/include LIBRARY_PATH=/path/to/lzo64/lib \
  CFLAGS="-arch x86_64" ant clean compile-native test tar
  }}}
+ * Note: If you used macports, /path/to/lzo64 will be replaced by /opt/local  (e.g. /opt/local/include
and /opt/local/lib ) * Note: If for some reason you are getting compilation errors, you can
add the following to the environment variables:
- 
- * Note: If you used macports, /path/to/lzo64 will be replaced by /opt/local  (e.g. /opt/local/include
and /opt/local/lib )
- * Note: If for some reason you are getting compilation errors, you can add the following
to the environment variables: 
  
  {{{
  CLASSPATH=$HADOOP_HOME/hadoop-<version>-core.jar
  }}}
- 
  * Note: Also during this install, if you are running into permission denied errors, even
as ROOT, you can go ahead and change permissions of those files in order for the build to
complete
  
  Once the install has completed, a jar file and lib files have been created in the HADOOP-GPL-HOME/build
directory.  All these files MUST be copied both into your HADOOP_HOME and HBASE_HOME directories
using the following commands from the HADOOP-GPL-HOME directory:
@@ -84, +87 @@

  > tar -cBf - -C build/hadoop-gpl-compression-0.1.0-dev/lib/native . | tar -xBvf - -C
$HADOOP_HOME/lib/native
  > tar -cBf - -C build/hadoop-gpl-compression-0.1.0-dev/lib/native . | tar -xBvf - -C
$HBASE_HOME/lib/native
  }}}
+ To build lzo2 from source in 64 bit mode:
  
+ {{{
- 
- 
- To build lzo2 from source in 64 bit mode: {{{
  $ CFLAGS="-arch x86_64" ./configure --build=x86_64-darwin --enable-shared --disable-asm
  <configure output>
- $ make 
+ $ make
  $ sudo make install
  }}}
- 
  On Linux:
  
-   * On Linux (with gcc compiler), to compile for 64-bit machine 
+  * On Linux (with gcc compiler), to compile for 64-bit machine
+ 
  {{{
  $ export CFLAGS="-m64"
  }}}
+ Build the native connector library:
  
- Build the native connector library: {{{
+ {{{
  $ ant compile-native
  $ ant jar
  }}}
- 
  On Mac, the resulting library should be x86_64, as above, if not, add in the extra CFLAGS
to build.xml in the call to configure in the target compile-native as listed above.
  
  Now you have the following results:
+ 
  {{{
   build/hadoop-gpl-compression-0.1.0-dev.jar
   build/native/Linux-amd64-64/lib/libgplcompression.*
  }}}
- 
  You might have Linux-i386-32 or Mac_OS_X-x86_64-64 or whatever platform you are actually
using.
  
- Copy the results into the hbase lib directory:{{{
+ Copy the results into the hbase lib directory:
+ 
+ {{{
  $ cp build/hadoop-gpl-compression-0.1.0-dev.jar hbase/lib/
  $ cp build/native/Linux-amd64-64/lib/libgplcompression.* hbase/lib/native/Linux-amd64-64/
  }}}
- 
  Note there is an extra 'lib' level in the build, which is not present in the hbase/lib/native/
tree.
  
- (VERY IMPORTANT)
- Distribute the new files to every machine in your cluster.
+ (VERY IMPORTANT) Distribute the new files to every machine in your cluster.
- 
- 
  
  == Using Lzo ==
+ While creating tables in hbase shell, specify the per-column family compression flag:
  
- While creating tables in hbase shell, specify the per-column family compression flag:
  {{{
   create 'mytable', {NAME=>'colfam:', COMPRESSION=>'lzo'}
  }}}
- 
  That's it!
  
  == Testing Compression is enabled ==
+ One more thing, to test compression is properly enabled, run: {{{./bin/hbase org.apache.hadoop.hbase.util.CompressionTest}}}
(Above presumes at least hbase 0.20.1) Above will dump out usage on how to run the CompressionTest.
 Be sure to run on all nodes in your cluster to ensure compression is working on all.
- One more thing, to test compression is properly enabled, run:
- {{{./bin/hbase org.apache.hadoop.hbase.util.CompressionTest}}}
- (Above presumes at least hbase 0.20.1)
- Above will dump out usage on how to run the CompressionTest.  Be sure to run on all nodes
in your cluster to ensure compression is working on all.
  
  == Other tools ==
- 
  Does this help?  Todd Lipcons' [[http://github.com/toddlipcon/hadoop-lzo-packager|hadoop-lzo-packager]]
  
  == Troubleshooting ==
- 
  If you get ''com.hadoop.compression.lzo.LzoCompressor: java.lang.UnsatisfiedLinkError'',
 check that 64 bit lzo libraries are being installed in /usr/lib rather than /usr/lib64. 
Even though a standalone java application to load up the lzo library could see it in /usr/lib,
running hadoop/hbase it wouldn't take.  Just copy the liblzo files over and make the appropriate
links (From Samuel Yu up on the mailing list)
  

Mime
View raw message