hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ann Pal <ann_r_...@yahoo.com>
Subject Map Reduce Phase questions:
Date Fri, 16 Dec 2011 13:33:18 GMT
I had some questions specifically on the Map-Reduce phase:

[1] For the reduce phase, the TaskTrackers corresponding to the reduce node, poll the Job
Tracker to know about maps that have completed and if the Jobtracker informs it about maps
that are complete, it then pulls the data from the map node where the map is complete. This
is a "pull" model as opposed to "push" model where the map directly sends a region of the
map output to the appropriate reduce node. Is the pull model the default  in 0.20, 0.23 etc

In the pull model, how does the Reduce node know it is responsible for a particular region
of map output? (Is this determined up front? From where it gets this information?)

[2]There can be multiple reduce tasks per reduce node. The number of reduce tasks is configurable,
How about the number of reduce nodes? How is this determined?

[3]Pre 0.23, The map/reduce tasks slots for a node are allocated statically . Is this based
on just configuration ?

Thanks in advance!
View raw message