hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gang Luo <lgpub...@yahoo.com.cn>
Subject combiner statistics
Date Tue, 05 Jan 2010 22:26:42 GMT
Hi all,
when I run a mapreduce job using combiner, I find that the combiner input # > map output
#, and combiner output # > reduce input #. My understanding to this is that, the combiner
has to re-read some records which have already been spilled to disk and combine them with
those records which come later. These re-read records are also counted as the "input", thus
increase the input counter value. Similarly, for we possibly do the combine on the same key
multiple times, we have to write it to disk multiple times, thus increase the combiner output

Please correct me it there is some problem in my understanding.

Besides, I am not sure whether the combiner can guarantee there is only one record for each
distinct key in each map task. Or does it just "try its best" to combine?




View raw message