hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Fraison (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-6982) Containers on lost nodes are considered failed after a too long time.
Date Fri, 13 Oct 2017 10:00:00 GMT
Nicolas Fraison created MAPREDUCE-6982:
------------------------------------------

             Summary: Containers on lost nodes are considered failed after a too long time.
                 Key: MAPREDUCE-6982
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6982
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mr-am
    Affects Versions: 2.6.0
         Environment: cdh5.5.0
            Reporter: Nicolas Fraison
            Priority: Minor


Containers on lost nodes (nodemanager being unavailable or server being unavailable) are considered
failed after a too long time.
This is due to the AppMaster trying to cleanup the container on the unavailable node.
The proposed path will limit the impact of this timeout by managing NodeManager lost events
on AM as described below:
*     on nodemanager service unavailibility (crash, oom ...):
    When receiving lost NodeManager events, it failed the impacted attempt and do not go through
the cleanup stage.
*     on nodemanager server unavailibility with default settings AM detect first that the
attempt is in timeout and try to cleanup the attempt:
When receiving lost NodeManager events, it stop the cleanup process on the impacted container
and failed the attempt.

This reduce the duration of the timeout to the timeout for detecting a NodeManager down.

Similar issue than [MAPREDUCE-6659|https://issues.apache.org/jira/browse/MAPREDUCE-6659] on
which I can't attached the patch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message