hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@apache.org>
Subject Re: Does all reducer take input from all NodeManager/Tasktrackers of Map tasks
Date Mon, 27 Jan 2014 17:06:34 GMT

On Jan 27, 2014, at 4:17 AM, Amit Mittal <amitmittal5@gmail.com> wrote:

> Question 1: I believe the TaskTracker and then JobTracker/AppMaster will receive the
updates through call to Task.statusUpdate(TaskUmbilicalProtocol obj). By which the JobTracker/AM
will know the location of the map's o/p file and host details etc, however how it will know
what all the partitions or keys this output has. In other words, from the heartbeat, how JobTracker
will know about data partitions/keys? It will be required to decide from which Mapper, the
mapper's output needs to be pulled or not.

Reducers pull map outputs from all the maps. So JobTracker/AppMaster simply give the completion
events of *all* the maps to every reducer. There is no need for JT/AM to track the distribution
of keys.

> Question 2: In short, not all reducer takes output from all Mappers, they only connects
and takes output related to the keys partitioned for that particular reducer.

That is in a sense correct.More clearly, all Reducers get a small chunk of output from all


NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

View raw message