hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Patterson <j...@cloudera.com>
Subject Re: Newbie to HDFS compression
Date Fri, 25 Jun 2010 14:21:42 GMT
Raymond,

LZO installation can be daunting even with the more recent
developments out there;

Most of this information is up at:

http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ

My quick guide: Installation for RedHat / Centos

- watch out for the various RPMs needed for lzo/2/devel support
- get the native libs in the hadoop/lib subdir from:
http://code.google.com/p/hadoop-gpl-compression/
- double check the permissions on these files; typically a set of "rw
rw r" permissions works well. also check the owner.
- get ant 1.8 to build the git repository if you are building any of the source
- move the lzo.jar into the hadoop/lib subdir


Changes to config: mapred-site.xml (add the following entries)

  <property>
    <name>mapred.compress.map.output</name>
    <value>true</value>
  </property>

  <property>
    <name>mapred.child.env</name>
    <value>JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native</value>
  </property>

  <property>
    <name>mapred.map.output.compression.codec</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>


Changes to Config: core-site.xml

Add these entries:

<property>
    <name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
  </property>
  <property>
    <name>io.compression.codec.lzo.class</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>



hadoop-env.sh

export HADOOP_CLASSPATH=/usr/lib/hadoop/lib/hadoop-lzo-0.4.3.jar
export JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-i386-32 (or
the 64bit version)

Usage

for older (deprecated/undeprecated) API to use lzo files as input to a MR job:

conf.setInputFormat( DeprecatedLzoTextInputFormat.class );

Use "lzop" to compress the file

http://www.lzop.org/

To index the file for splitting on input:

In process locally:

hadoop jar /path/to/your/hadoop-lzo.jar
com.hadoop.compression.lzo.LzoIndexer big_file.lzo

On cluster, In MR:

hadoop jar /path/to/your/hadoop-lzo.jar
com.hadoop.compression.lzo.DistributedLzoIndexer
/hdfs/dir/big_file.lzo

To Compress the output of the entire job so that the output file in
hdfs is a LZO compressed file:

TextOutputFormat.setOutputCompressorClass(conf,
com.hadoop.compression.lzo.LzopCodec.class)
TextOutputFormat.setCompressOutput(conf, true);


Josh Patterson

Solutions Architect
Cloudera

On Thu, Jun 24, 2010 at 5:12 PM, Raymond Jennings III
<raymondjiii@yahoo.com> wrote:
>
> Oh, maybe that's what I meant :-)  I recall reading something on this mail group that
"the compression" in not included with the hadoop binary and that you have to get and install
it separately due to license incompatibilities.  Looking at the config xml files it's not
clear what I need to do.  Thanks.
>
>
>
> ----- Original Message ----
> From: Eric Sammer <esammer@cloudera.com>
> To: common-user@hadoop.apache.org
> Sent: Thu, June 24, 2010 5:09:33 PM
> Subject: Re: Newbie to HDFS compression
>
> There is no file system level compression in HDFS. You can stored
> compressed files in HDFS, however.
>
> On Thu, Jun 24, 2010 at 11:26 AM, Raymond Jennings III
> <raymondjiii@yahoo.com> wrote:
> > Are there instructions on how to enable (which type?) of compression on hdfs?  Does
this have to be done during installation or can it be added to a running cluster?
> >
> > Thanks,
> > Ray
> >
> >
> >
> >
>
>
>
> --
> Eric Sammer
> twitter: esammer
> data: www.cloudera.com
>
>
>
>

Mime
View raw message