hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: Basic question on how reducer works
Date Mon, 09 Jul 2012 20:33:59 GMT

On Jul 9, 2012, at 12:55 PM, Grandl Robert wrote:

> Thanks a lot guys for answers. 
> 
> Still I am not able to find exactly the code for the following things:
> 
> 1. reducer to read from a Map output only its partition. I looked into ReduceTask#getMapOutput
which do the actual read in ReduceTask#shuffleInMemory, but I don't see where it specify which
partition to read(reduceID).
> 

Look at TaskTracker.MapOutputServlet.

> 2. still don't understand very well in which part of the code(MapTask.java) the intermediate
data is written do which partition. So MapOutputBuffer is the one who actually writes the
data to buffer and spill after buffer is full. Could you please elaborate a bit on how the
data is written to which partition ?
> 

Essentially you can think of the partition-id as the 'primary key' and the actual 'key' in
the map-output of <key, value> as the 'secondary key'.

hth,
Arun

> Thanks,
> Robert
> 
> From: Arun C Murthy <acm@hortonworks.com>
> To: mapreduce-user@hadoop.apache.org 
> Sent: Monday, July 9, 2012 9:24 AM
> Subject: Re: Basic question on how reducer works
> 
> Robert,
> 
> On Jul 7, 2012, at 6:37 PM, Grandl Robert wrote:
> 
>> Hi,
>> 
>> I have some questions related to basic functionality in Hadoop. 
>> 
>> 1. When a Mapper process the intermediate output data, how it knows how many partitions
to do(how many reducers will be) and how much data to go in each  partition for each reducer
?
>> 
>> 2. A JobTracker when assigns a task to a reducer, it will also specify the locations
of intermediate output data where it should retrieve it right ? But how a reducer will know
from each remote location with intermediate output what portion it has to retrieve only ?
> 
> To add to Harsh's comment. Essentially the TT *knows* where the output of a given map-id/reduce-id
pair is present via an output-file/index-file combination.
> 
> Arun
> 
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
> 
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Mime
View raw message