hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Virajith Jalaparti <virajit...@gmail.com>
Subject How does a ReduceTask determine which MapTask output to read?
Date Wed, 29 Jun 2011 21:28:42 GMT

I was wondering what scheduling algorithm is used in Hadoop (version 
0.20.2 in particular), for a ReduceTask to determine in what order it is 
supposed to read the map outputs from the various mappers that have been 
run? In particular, suppose we have 10maps called map1, map2,...., 
map10. and say 2 reducers r1 and r2. Which map's output does r1/r2 read 
from first?

Also, suppose that the mapred.reduce.parallel.copies is set to 5. Then 
do both r1 and r2 read from 5 map outputs concurrently?


View raw message