hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Mittal <amitmitt...@gmail.com>
Subject Does all reducer take input from all NodeManager/Tasktrackers of Map tasks
Date Mon, 27 Jan 2014 12:17:26 GMT
Hi,

Does all reducer take input from all NodeManager/Tasktrackers of Map tasks ?

*Reference:* "Hadoop: The Definitive Guide:3rd Ed" book by "Tom White"
On page# 210 (Ch 6: How MapReduce Works > Shuffle & Sort > The reducer side)

There is a note, here is the text from book:
How do reducers know which machines to fetch map output from?
...
Therefore, for a given job, the jobtracker (or application master) knows
the mapping between map outputs and hosts. A thread in the reducer
periodically asks the master for map output hosts
until it has retrieved them all.
...
*Question 1:* I believe the TaskTracker and then JobTracker/AppMaster will
receive the updates through call to Task.statusUpdate(TaskUmbilicalProtocol
obj). By which the JobTracker/AM will know the location of the map's o/p
file and host details etc, however how it will know what all the partitions
or keys this output has. In other words, from the heartbeat, how JobTracker
will know about data partitions/keys? It will be required to decide from
which Mapper, the mapper's output needs to be pulled or not.
*Question 2:* In short, not all reducer takes output from all Mappers, they
only connects and takes output related to the keys partitioned for that
particular reducer.

Thanks
Amit Mittal

Mime
View raw message