hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Utku Can Top├žu <u...@topcu.gen.tr>
Subject LZO Compression in Hadoop release 0.20.1
Date Mon, 12 Oct 2009 08:14:09 GMT
Hey Guys,

After a couple of years using hadoop for personal issues, I finally had the
chance to scale it to the enterprise.
Since now, I really haven't had any need for compressing my files.
After the emerging need for TB's of load a day, a quick google search: I
came up with Johan's blog post on LzoTextInputFormat
http://blog.oskarsson.nu/2009/03/hadoop-feat-lzo-save-disk-space-and.html,
that led me to http://code.google.com/p/hadoop-gpl-compression/ project and
http://github.com/kevinweil/hadoop-lzo

Anyway here's the deal,
* I compiled both the native binaries and java classes in the sources.
* Added, the required configuration settings in
http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ
* compressed a 100G log file to a lzo file, and put it in the HDFS,
* created the index on the file using the tool in kevinweil's hadoop-lzo
* Used the LzoTextInputFormat.setInputPaths to point the compressed log file

Fired the mapreduce program, however whole process ran in just one mapper. I
couldn't be able to distribute the mapper in the cluster.

Any ideas and comments on how to distribute this to multiple mappers?

Thanks,
Utku

Mime
View raw message