hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-248) locating map outputs via random probing is inefficient
Date Tue, 23 May 2006 23:27:29 GMT
locating map outputs via random probing is inefficient

         Key: HADOOP-248
         URL: http://issues.apache.org/jira/browse/HADOOP-248
     Project: Hadoop
        Type: Improvement

  Components: mapred  
    Versions: 0.2.1    
    Reporter: Owen O'Malley
 Assigned to: Owen O'Malley 
     Fix For: 0.3

Currently the ReduceTaskRunner polls the JobTracker for a random list of map tasks asking
for their output locations. It would be better if the JobTracker kept an ordered log and the
interface was changed to:

class MapLocationResults {
   public int getTimestamp();
   public MapOutputLocation[] getLocations();

interface InterTrackerProtocol {
  MapLocationResults locateMapOutputs(int prevTimestamp);

with the intention that each time a ReduceTaskRunner calls locateMapOutputs, it passes back
the "timestamp" that it got from the previous result. That way, reduces can easily find the
new MapOutputs. This should help the "ramp up" when the maps first start finishing.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message