hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-200) The map task names are sent to the reduces
Date Sun, 07 May 2006 03:55:20 GMT
The map task names are sent to the reduces

         Key: HADOOP-200
         URL: http://issues.apache.org/jira/browse/HADOOP-200
     Project: Hadoop
        Type: Bug

  Components: mapred  
    Versions: 0.2    
    Reporter: Owen O'Malley
 Assigned to: Owen O'Malley 
     Fix For: 0.3

As each reduce is created, it is given the entire set of potential map names. For my large
sort jobs with 64k maps, this means that each reduce task is given a two dimensional array
that is 5 tasks/map * 64k maps = 320k strings. Since the reduce task is passed from the job
tracker to the task tracker and down to the task runner, passing the entire list is very expensive.
I suspect that this is the cause of the slow downs that I see in the task trackers heart beats
when the reduce tasks are being launched.

I propose that the ReduceTask be changed to just get the count of maps, with ids from 0 ..
maps -1.
  public ReduceTask(String jobFile, String taskId, int maps, int partition);
Then we need to change the protocol for finding map outputs:
  MapOutputLocation[] locateMapOutputs(String jobId, int[] mapIds, int partition);

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message