hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Patterson <j...@cloudera.com>
Subject Re: Newbie to HDFS compression
Date Fri, 25 Jun 2010 14:21:42 GMT

LZO installation can be daunting even with the more recent
developments out there;

Most of this information is up at:


My quick guide: Installation for RedHat / Centos

- watch out for the various RPMs needed for lzo/2/devel support
- get the native libs in the hadoop/lib subdir from:
- double check the permissions on these files; typically a set of "rw
rw r" permissions works well. also check the owner.
- get ant 1.8 to build the git repository if you are building any of the source
- move the lzo.jar into the hadoop/lib subdir

Changes to config: mapred-site.xml (add the following entries)




Changes to Config: core-site.xml

Add these entries:



export HADOOP_CLASSPATH=/usr/lib/hadoop/lib/hadoop-lzo-0.4.3.jar
export JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-i386-32 (or
the 64bit version)


for older (deprecated/undeprecated) API to use lzo files as input to a MR job:

conf.setInputFormat( DeprecatedLzoTextInputFormat.class );

Use "lzop" to compress the file


To index the file for splitting on input:

In process locally:

hadoop jar /path/to/your/hadoop-lzo.jar
com.hadoop.compression.lzo.LzoIndexer big_file.lzo

On cluster, In MR:

hadoop jar /path/to/your/hadoop-lzo.jar

To Compress the output of the entire job so that the output file in
hdfs is a LZO compressed file:

TextOutputFormat.setCompressOutput(conf, true);

Josh Patterson

Solutions Architect

On Thu, Jun 24, 2010 at 5:12 PM, Raymond Jennings III
<raymondjiii@yahoo.com> wrote:
> Oh, maybe that's what I meant :-)  I recall reading something on this mail group that
"the compression" in not included with the hadoop binary and that you have to get and install
it separately due to license incompatibilities.  Looking at the config xml files it's not
clear what I need to do.  Thanks.
> ----- Original Message ----
> From: Eric Sammer <esammer@cloudera.com>
> To: common-user@hadoop.apache.org
> Sent: Thu, June 24, 2010 5:09:33 PM
> Subject: Re: Newbie to HDFS compression
> There is no file system level compression in HDFS. You can stored
> compressed files in HDFS, however.
> On Thu, Jun 24, 2010 at 11:26 AM, Raymond Jennings III
> <raymondjiii@yahoo.com> wrote:
> > Are there instructions on how to enable (which type?) of compression on hdfs?  Does
this have to be done during installation or can it be added to a running cluster?
> >
> > Thanks,
> > Ray
> >
> >
> >
> >
> --
> Eric Sammer
> twitter: esammer
> data: www.cloudera.com

View raw message