hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark <static.void....@gmail.com>
Subject Re: LZO Compression
Date Sun, 30 Oct 2011 16:33:28 GMT
Thanks for the info, very helpful.

Whats the difference between LZO and Snappy? I like how Cloudera has 
snappy support so it looks like im going to go with that but I just 
wanted to know the tradeoffs.

Thanks again

On 10/29/11 8:52 PM, Harsh J wrote:
> Hey Mark,
> (Before you jump in with LZO, perhaps consider using Snappy+SequenceFiles?)
> On 30-Oct-2011, at 7:59 AM, Mark wrote:
>> Email was sent a bit prematurely.
>> Anyway. How can one test that LZO compression is configured correctly? I've found
multiple sources on how to compile the hadoop-lzo jars and native files but no where did I
see a definitive way to test that the installation/configuration is correct.
> You can run the compression codec test for per-node, or run a job that reads or writes
with that codec.
> Single node test example, using an available test jar:
> $ HADOOP_CLASSPATH=/usr/lib/hadoop/hadoop-test-0.20.2-cdh3u2.jar hadoop org.apache.hadoop.io.compress.TestCodec
-count 1000 -codec com.hadoop.compression.lzo.LzoCodec
>> Also, when is this compression enabled? Is it enabled on every file I write? Do I
somehow have to specify that I want to use this format? For example we have a rather large
directory of server logs ... /user/mark/logs. How can we enable compression on this directory?
> Compression in HDFS is pure client-side settings. You can't enable it 'globally'.
> For jobs, you may set the {File}OutputFormat#setOutputCompressorClass(…) to the desired
class to have final job outputs written with that codec (Compression of write streams is toggled
by {File}OutputFormat#setCompressOutput(…)). For optimizing the transient stages, you can
use JobConf#setMapOutputCompressorClass(…) and toggle with JobConf#setCompressMapOutput(…).
> Reading compressed files back again is handled automagically by your Hadoop framework,
and should require no settings.
> Hence, for a fully distributed test of your LZO install (which you may have hopefully
done with Todd's easy tool at https://github.com/toddlipcon/hadoop-lzo-packager), you can
run a simple parameterized (or mapred-site.xml configured) wordcount via an available example
> $ hadoop jar /usr/lib/hadoop/hadoop-examples-0.20.2-cdh3u2.jar wordcount -Dmapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec
-Dmapred.output.compress=true inputDir outputDir
> Hope this helps!
> --
> Harsh J

View raw message