hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Utku Can Top├žu <u...@topcu.gen.tr>
Subject LZO Compression in Hadoop release 0.20.1
Date Mon, 12 Oct 2009 08:14:09 GMT
Hey Guys,

After a couple of years using hadoop for personal issues, I finally had the
chance to scale it to the enterprise.
Since now, I really haven't had any need for compressing my files.
After the emerging need for TB's of load a day, a quick google search: I
came up with Johan's blog post on LzoTextInputFormat
that led me to http://code.google.com/p/hadoop-gpl-compression/ project and

Anyway here's the deal,
* I compiled both the native binaries and java classes in the sources.
* Added, the required configuration settings in
* compressed a 100G log file to a lzo file, and put it in the HDFS,
* created the index on the file using the tool in kevinweil's hadoop-lzo
* Used the LzoTextInputFormat.setInputPaths to point the compressed log file

Fired the mapreduce program, however whole process ran in just one mapper. I
couldn't be able to distribute the mapper in the cluster.

Any ideas and comments on how to distribute this to multiple mappers?


View raw message