hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Sammer <e...@lifeless.net>
Subject Re: MapRed ports
Date Wed, 10 Feb 2010 06:16:34 GMT

I can answer at least some of these questions for you. See below.

On 2/10/10 12:47 AM, psdc1978 wrote:
> Hi,
> I've some question about the MapRed ports and how a reduce knows where
> the map output is to fetch.
> I know that MapRed uses jetty has a webserver.
> - The JobTracker send tasks to the TaskTracker execute them through port
> 50060?

Task Trackers contact the Job Tracker every N seconds and report health
and the number of available slots for tasks. The response of this
heartbeat from the JT is where work may actually be assigned. So the JT
doesn't really send work to the TT as much as the TT checks in and
happens to get work in the response.

> - Which port TaskTracker uses to send status about the task that its
> executing to the JobTracker? Is it through port 50030?

Task trackers use the JT port specified by mapred.job.tracker. The 50030
port is the web UI.

> - MapReduce uses the port 54311. What's the reason of this port?

Not entirely sure.

> - The Reduce task in the shuffle phase must copy the map outputs. In
> which class is the part of the code where Reduce will fetch the map
> output? This part of the code is executed by the TaskTracker process?

I don't remember the exact class and method, but I would look around
ReduceTask and the other classes in o.a.h.mapred.*. You'll find it. Task
trackers do pull the map output, yes, so this fetch code is run on the
task trackers. I'm not 100% sure if this happens as part of the reduce
task or if it happens prior to the reduce task jvm is forked. I don't
want to guess and confusing things further.

> - The directory where the map output is to the reduce task use, is sent
> by the JobTracker? If so, this means that the JobTracker was informed by
> the task tracker where a map run, right?

Someone can correct me if I'm wrong, but I'm pretty sure it goes:

1. The JT assigns map tasks to TTs based on locality, etc.
2. They finish and report status back to the JT.
3. The JT assigns a reduce task to a TT and informs it as to the TT
where the map output is. The TTs fetch the map output via HTTP calls.

> - The class org.apache.hadoop.mapred.ReduceTask is used? If so, which
> process use this class? Is it the TaskTracker process?

ReduceTask is, if I recall, the class that does what MapRunner does, but
for reduce tasks. I'm pretty sure this is the core of what happens with
a TT runs a reduce task. The TT is what runs / uses this class, yes.

The Hadoop definitive guide book has a good section that describes the
process of map reduce (although I don't think he gets into the specific
class names, etc.).

Hope this help. If I've said anything wrong, I'm very happy to have
people correct me.

Eric Sammer

View raw message