hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-248) locating map outputs via random probing is inefficient
Date Thu, 25 Jan 2007 23:41:49 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Owen O'Malley updated HADOOP-248:

    Attachment: 248-9.patch

This is a minor modification of Devaraj's patch that fixes a spelling mistake (OBSELETE) and
use TaskInProgress.partition instead of parsing the task id.

> locating map outputs via random probing is inefficient
> ------------------------------------------------------
>                 Key: HADOOP-248
>                 URL: https://issues.apache.org/jira/browse/HADOOP-248
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.2.1
>            Reporter: Owen O'Malley
>         Assigned To: Devaraj Das
>         Attachments: 248-9.patch, 248-initial7.patch, 248-initial8.patch
> Currently the ReduceTaskRunner polls the JobTracker for a random list of map tasks asking
for their output locations. It would be better if the JobTracker kept an ordered log and the
interface was changed to:
> class MapLocationResults {
>    public int getTimestamp();
>    public MapOutputLocation[] getLocations();
> }
> interface InterTrackerProtocol {
>   ...
>   MapLocationResults locateMapOutputs(int prevTimestamp);
> } 
> with the intention that each time a ReduceTaskRunner calls locateMapOutputs, it passes
back the "timestamp" that it got from the previous result. That way, reduces can easily find
the new MapOutputs. This should help the "ramp up" when the maps first start finishing.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message