hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3478) The algorithm to decide map re-execution on fetch failures can be improved
Date Mon, 02 Jun 2008 16:45:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601670#action_12601670
] 

Devaraj Das commented on HADOOP-3478:
-------------------------------------

Yes, we should *randomize by the hosts*. But for a given host we should sort it by the mapIDs
to detect faults early enough (the comments above). The knownOutputs stucture today is a list.
That might be done away with and instead a map from locations to MapIDs could be maintained
(whenever we get a map completion event, we know the location anyway).
In order to protect against early or too aggressive killing, we should probably maintain the
strategy of waiting for notifications from multiple reducers for all maps. Since the map failure
notifications are sent only after a certain number of retries, we should be okay in protecting
the maps against temporary network glitches.

> The algorithm to decide map re-execution on fetch failures can be improved
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-3478
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3478
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Jothi Padmanabhan
>
> The algorithm to decide map re-execution on fetch failures can be improved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message