[ https://issues.apache.org/jira/browse/HADOOP-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601680#action_12601680
]
Runping Qi commented on HADOOP-3478:
------------------------------------
bq >In order to protect against early or too aggressive killing, we should probably maintain
the strategy of waiting for notifications from multiple >reducers for all maps. Since the
map failure notifications are sent only after a certain number of retries, we should be okay
in protecting the >maps against temporary network glitches
We should differentiate between the progress stage of the job.
If there are a lot of unfinished mappers, then we should not do aggressive mapper re-executions.
If reducers have a lot of un-fetched map outputs, they can wait for a longer period time before
re-fetch the
map outputs that failed to fetcher previously. However, if one or more reducers are waiting
for one or a few map-outputs,
then the reducers should re-try aggressively, and if fail persists, the mappers should be
re-executed aggressively.
> The algorithm to decide map re-execution on fetch failures can be improved
> --------------------------------------------------------------------------
>
> Key: HADOOP-3478
> URL: https://issues.apache.org/jira/browse/HADOOP-3478
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Jothi Padmanabhan
>
> The algorithm to decide map re-execution on fetch failures can be improved.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
|