hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-1183) MapTask completion not recorded properly at the Reducer's end
Date Thu, 29 Mar 2007 15:13:25 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Devaraj Das updated HADOOP-1183:

    Attachment: 1183.patch

Retrials of map output fetches might overwrite the new events got from the JT for the same
maps. Lets assume that a tasktracker is lost while we are in the process of fetching map outputs
from it. There is a timing issue between when a mapoutput fetch completes with a failure,
and when a new event for the same map task is obtained. If the latter is got before the former,
and if the fetch corresponding to the new event is not scheduled before the former, then it
will lead to loss of this new event (overwritten with the retrial for the old failed fetch).

The attached patch should handle this issue - here the FAILED events are explicitly handled.
Please review it (while i am testing it on a big cluster).

> MapTask completion not recorded properly at the Reducer's end
> -------------------------------------------------------------
>                 Key: HADOOP-1183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1183
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Devaraj Das
>         Assigned To: Devaraj Das
>            Priority: Critical
>         Attachments: 1183.patch
> A couple of reducers were continuously trying to fetch map outputs from a lost tasktracker.
Although the tasks running on that lost TT successfully reexecuted elsewhere, the Reducers'
tasktrackers didn't correctly note those events.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message