hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "gaoyu (Jira)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-7349) An unexpected node crash and delayed messages would fail the job
Date Thu, 03 Jun 2021 08:28:00 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-7349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

gaoyu updated MAPREDUCE-7349:

Related cluster configuration:
 * NodeManager recovery is disabled

Bug scenario:
 # submit a wordcount job which contains 2 simple map tasks ({{map_0}} and {{map_1}}) and
1 simple reduce task ({{reduce_0}});
 # all map tasks were finished successfully and the AppMaster was notified;
 # the NodeManager which runs the map task {{map_1}} crashes;
 # the AppMaster schedules a reduce attempt;
 # the reduce attempt sends {{statusUpdate}} message to AppMaster to notify a fetch failure;
 # the reduce attempt fails due to {{Shuffle$ShuffleError}} which was caused by {{java.io.IOException:
Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out}};
 # the reduce attempt send message {{fatalError}} to AppMaster
 # the AppMaster successively reschedules another three reduce attempts, but all of them were
failed due to {{Shuffle$ShuffleError}};
 # AppMaster fails the wordcount job due to the failed reduce task;
 # AppMaster receives three {{statusUpdate}} messages that state a fetch failure like the
message in step 5, but it has already failed the job and would not rerun the task {{map_1}}.

> An unexpected node crash and delayed messages would fail the job
> ----------------------------------------------------------------
>                 Key: MAPREDUCE-7349
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7349
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>    Affects Versions: 3.2.2
>            Reporter: gaoyu
>            Priority: Major

This message was sent by Atlassian Jira

To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org

View raw message