hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Spill file compression
Date Wed, 07 Nov 2012 12:41:05 GMT
Yes we do compress each spill output using the same codec as specified
for map (intermediate) output compression. However, the counted bytes
may be counting decompressed values of the records written, and not
post-compressed ones.

On Wed, Nov 7, 2012 at 6:02 PM, Sigurd Spieckermann
<sigurd.spieckermann@gmail.com> wrote:
> Hi guys,
> I've encountered a situation where the ratio between "Map output bytes" and
> "Map output materialized bytes" is quite huge and during the map-phase data
> is spilled to disk quite a lot. This is something I'll try to optimize, but
> I'm wondering if the spill files are compressed at all. I set
> mapred.compress.map.output=true and
> mapred.map.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec
> and everything else seems to be working correctly. Does Hadoop actually
> compress spills or just the final spill after finishing the entire map-task?
> Thanks,
> Sigurd

Harsh J

View raw message