hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@yahoo-inc.com>
Subject Re: MapRed ports
Date Wed, 10 Feb 2010 09:17:08 GMT

On Feb 9, 2010, at 9:47 PM, psdc1978 wrote:

> Hi,
>
> I've some question about the MapRed ports and how a reduce knows  
> where the map output is to fetch.
>
> I know that MapRed uses jetty has a webserver.
>
> - The JobTracker send tasks to the TaskTracker execute them through  
> port 50060?
>

TT sends a heartbeat RPC periodically, the response to which contains  
the new tasks to be launched.

> - Which port TaskTracker uses to send status about the task that its  
> executing to the JobTracker? Is it through port 50030?
>

The TT uses the JT's RPC port (which is *not* 50030 by default),  
configured by mapred.job.tracker.
>
> - The Reduce task in the shuffle phase must copy the map outputs. In  
> which class is the part of the code where Reduce will fetch the map  
> output? This part of the code is executed by the TaskTracker process?
>

The reduce task itself (in a separate JVM from the TT) fetches map  
outputs, look at o.a.h.mapred.ReduceTask:ReduceCopier.fetchOutputs().

> - The directory where the map output is to the reduce task use, is  
> sent by the JobTracker? If so, this means that the JobTracker was  
> informed by the task tracker where a map run, right?
>

JT knows where each successful map-task was scheduled, the reduce-task  
gets this information via TaskCompletionEvents  
(ReduceTask.ReduceCopier.GetMapEventsThread).

> - The class org.apache.hadoop.mapred.ReduceTask is used? If so,  
> which process use this class? Is it the TaskTracker process?
>

That is code being run in the child jvm of the ReduceTask.

Arun
Mime
View raw message