hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@yahoo-inc.com>
Subject Re: Partition - Distribution of map outputs
Date Wed, 30 Jun 2010 20:36:44 GMT

On Jun 30, 2010, at 1:29 PM, Pedro Costa wrote:
> As I understand from what I've read, the partition has the purpose  
> to tell to each reducer which map output it will have.
> For example, if I've 3 split files and 2 reduces defined in my  
> example, on the map side it wil be produce 3 map outputs (one map  
> per split file) and on the reduce side, it will be produced 2 part-*  
> files. The part_00000 it will contains the results of 2 map outputs  
> and the part_00001 will contain the results of 1 map output.
>

Typically, each map produces output for each reduce.

In your e.g. part-00000 will contain output of reduce-0 and part-00001  
will contain output of reduce-1.

> - My question is, where in the Hadoop MR is set "which Reduce  
> contains which Map Output"? Is it during the creation of the reduce  
> tasks, or is in another phase of the MR?
>
> - Can you point me which class does this distribution of map outputs  
> to the reduce tasks?
>

Take a look at the Partitioner - the partitioner for the job decides  
which keys are sent to which reduce.

Arun

Mime
View raw message