hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lohit <lohit...@yahoo.com>
Subject Re: How to enable compression of blockfiles?
Date Fri, 08 Aug 2008 22:42:50 GMT
I think at present only SequenceFiles can be compressed. http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/SequenceFile.html
If you have plain text files, they are stored as is into blocks. You can store them as .gz
and hadoop recognizes it and process the gz files. But its not splittable, meaning each map
will consume whole of .gz

----- Original Message ----
From: Michael K. Tung <mike@ai.stanford.edu>
To: core-user@hadoop.apache.org
Sent: Friday, August 8, 2008 1:09:01 PM
Subject: How to enable compression of blockfiles?

Hello, I have a simple question.  How do I configure DFS to store compressed
block files?  I've noticed by looking at the "blk_" files that the text
documents I am storing are uncompressed.  Currently our hadoop deployment is
taking up 10x the diskspace as compared to our system before moving to
hadoop. I've tried modifying the io.seqfile.compress.blocksize option
without success and haven't been able to find anything online regarding
this.  Is there any way to do this or do I need to manually compress my data
before storing to HDFS?


Michael Tung

View raw message