hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-248) locating map outputs via random probing is inefficient
Date Thu, 22 Feb 2007 14:47:06 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Devaraj Das updated HADOOP-248:

    Attachment: 248-fixed1.patch

This had a problem introduced unintentionally in the last submission (by Owen, when he corrected
the spelling of OBSOLETE, etc.). The problem was that there is a variable called fromEventId
which is used to track from which eventId a tasktracker should fetch events from from the
jobtracker. This was earlier a IntWritable object, so that set(<somenumber>) could be
done on the object and the new value of the 'int' within the object could be seen even when
the method invocation returned. This variable was changed to an int and instead "fromEventId
+= <somenumber>" was done. Unfortunately, this would not be visible when the method
invocation returned and hence the TaskTracker would get stuck at a particular eventId and
would make no forward progress...
Attached is the new patch which has the IntWritable stuff put back in, and also the method
JobClient.listEvents has been modified to take two extra args - fromEventId, numEvents (this
method didn't exist when I was earlier working on this issue). The JobSubmissionProtocol version
has been changed also to reflect the change in the getTaskCompletionEvents protocol method
(missed this in the earlier patch).

> locating map outputs via random probing is inefficient
> ------------------------------------------------------
>                 Key: HADOOP-248
>                 URL: https://issues.apache.org/jira/browse/HADOOP-248
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.2.1
>            Reporter: Owen O'Malley
>         Assigned To: Devaraj Das
>         Attachments: 248-9.patch, 248-fixed1.patch, 248-initial7.patch, 248-initial8.patch
> Currently the ReduceTaskRunner polls the JobTracker for a random list of map tasks asking
for their output locations. It would be better if the JobTracker kept an ordered log and the
interface was changed to:
> class MapLocationResults {
>    public int getTimestamp();
>    public MapOutputLocation[] getLocations();
> }
> interface InterTrackerProtocol {
>   ...
>   MapLocationResults locateMapOutputs(int prevTimestamp);
> } 
> with the intention that each time a ReduceTaskRunner calls locateMapOutputs, it passes
back the "timestamp" that it got from the previous result. That way, reduces can easily find
the new MapOutputs. This should help the "ramp up" when the maps first start finishing.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message