hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <ar...@yahoo-inc.com>
Subject Re: File Compression
Date Tue, 13 Nov 2007 17:13:28 GMT

On Tue, Nov 13, 2007 at 08:56:36AM -0800, Michael Harris wrote:
>I have a question about file compression in Hadoop. When I set the io.seqfile.compression.type=BLOCK
does this also compress actual files I load in the DFS or does this only control the map/reduce
file compression? If it doesnt compress the files on the file system, is there any way to
compress a file when its loaded? The concern here is that I am just getting started with Pig/Hadoop
and have a very small cluster of around 5 nodes. I want to limit IO wait by compressing the
actual data. As a test when I compressed our 4GB log file using rar it was only 280mb.

If you are loading files into HDFS as a SequenceFile and you set io.seqfile.compression.type=BLOCK
(or RECORD) the file will have compressed records. Equivalently you can also use one of the
many SequenceFile.createWriter methods (see http://lucene.apache.org/hadoop/api/org/apache/hadoop/io/SequenceFile.html)
to specify the compression type, compression codec etc.


View raw message