hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Kiefer <tim-kie...@gmx.de>
Subject How are intermediate key/value pairs materialized between map and reduce?
Date Tue, 23 Feb 2010 11:44:28 GMT
Hi there,

can anybody help me out on a (most likely) simple unclarity.

I am wondering how intermediate key/value pairs are materialized. I have 
a job where the map phase produces 600,000 records and map output bytes 
is ~300GB. What I thought (up to now) is that these 600,000 records, 
i.e., 300GB, are materialized locally by the mappers and that later on 
reducers pull these records (based on the key).
What I see (and cannot explain) is that the FILE_BYTES_WRITTEN counter 
is as high as ~900GB.

So - where does the factor 3 come from between Map output bytes and 
FILE_BYTES_WRITTEN??? I thought about the replication factor of 3 in the 
file system - but that should be HDFS only?!

- tim

View raw message