hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johan Oskarsson <jo...@oskarsson.nu>
Subject Re: SequenceFileOutputFormat compression codec?
Date Fri, 16 Mar 2007 16:45:16 GMT
Arun C Murthy wrote:
> Hi Johan,
> 
> On Tue, Mar 13, 2007 at 05:50:21PM +0000, Johan Oskarsson wrote:
>> Hi.
>>
>> I can't seem to find out how to set the compression codec in a 
>> SequenceFile if it's created when a program runs with the output format 
>> set to SequenceFileOutputFormat.
>>
> 
> Use these knobs:
> mapred.output.compress (set this to 'true')
> io.seqfile.compression.type (NONE, RECORD, BLOCK)
> mapred.output.compression.codec (zlib, lzo etc.)
> 
> @see SequenceFileOutputFormat.getRecordWriter() for more info...
> 
> hth,
> Arun

Ah, thanks. Works great. The javadoc and name of the JobConf method 
setMapOutputCompressorClass that sets "mapred.output.compression.codec" 
threw me off, it did sound like it was only for the map -> reduce stage 
and not the actual reduce output as well.

Final question, does this also work for a MapFileOutputFormat? I'm 
running a benchmark and the block vs record seems to make a difference, 
but I believe it's only using the default compression codec (same 
filesize of output even though I change the codec with 
OutputFormatBase.setOutputCompressorClass). Works just fine with 
SequenceFileOutputFormat.


/Johan

Mime
View raw message