hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@yahoo-inc.com>
Subject Re: MR job scheduler
Date Fri, 21 Aug 2009 06:23:33 GMT

On Aug 20, 2009, at 9:20 PM, bharath vissapragada wrote:

> OK i'll be a bit more specific ,
> Suppose map outputs 100 different keys .
> Consider a key "K" whose correspoding values may be on N diff  
> datanodes.
> Consider a datanode "D" which have maximum number of values . So  
> instead of
> moving the values on "D"
> to other systems , it is useful to bring in the values from other  
> datanodes
> to "D" to minimize the data movement and
> also the delay. Similar is the case with All the other keys . How  
> does the
> scheduler take care of this ?

Map-Reduce doesn't 'bring' values from N datanodes to the map. A map  
gets a single block of data to work with, N-1 other maps get the other  
N-1 blocks; thus multiple maps might get the key K and different  
values. Eventually the output of the maps i.e. K and values <V> land  
up at one of the reduces (based on the Partitioner). Please read some  
of the widely available map-reduce literature for more details.


View raw message