hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <...@yahoo-inc.com>
Subject Re: Shuffle phase
Date Mon, 03 Mar 2008 00:23:41 GMT

On Mar 2, 2008, at 12:53 PM, momina khan wrote:

> i have trouble comprehending what shuffle phase is exactly ... can
> anyone plz exlpain in for me.... and also point out the name of the
> class that the thread for shuffle runs and also the class spawning the
> thread itself!

The shuffle phase is the data motion from the map output to the  
reduce input. In general, it involves each reduce collecting outputs  
from each map, which is why it is called the "shuffle". The  
TaskTracker where the map ran has a jetty server that gives out the  
map outputs. The ReduceTask copies the map outputs as they finish.  
You can look at ReduceTask.java for the client side of the shuffle.

-- Owen

Mime
View raw message