hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sigurd Spieckermann <sigurd.spieckerm...@gmail.com>
Subject Spill file compression
Date Wed, 07 Nov 2012 12:32:34 GMT
Hi guys,

I've encountered a situation where the ratio between "Map output bytes" and
"Map output materialized bytes" is quite huge and during the map-phase data
is spilled to disk quite a lot. This is something I'll try to optimize, but
I'm wondering if the spill files are compressed at all. I set
mapred.compress.map.output=true
and mapred.map.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec
and everything else seems to be working correctly. Does Hadoop actually
compress spills or just the final spill after finishing the entire map-task?

Thanks,
Sigurd

Mime
View raw message