hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jun Gong (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-3474) Add a way to let NM wait RM to come back, not kill running containers
Date Fri, 10 Apr 2015 14:04:12 GMT
Jun Gong created YARN-3474:
------------------------------

             Summary: Add a way to let NM wait RM to come back, not kill running containers
                 Key: YARN-3474
                 URL: https://issues.apache.org/jira/browse/YARN-3474
             Project: Hadoop YARN
          Issue Type: New Feature
            Reporter: Jun Gong
            Assignee: Jun Gong


When RM HA is enabled and active RM shuts down, standby RM will become active, recover apps
and attempts. Apps will not be affected. 

If there are some cases or bugs that cause both RM could not start normally(e.g. [YARN-2340|https://issues.apache.org/jira/browse/YARN-2340];
RM could not connect with ZK well). NM will kill containers running on it when  it could not
heartbeat with RM for some time(max retry time is 15 mins by default). Then all apps will
be killed. 

In production cluster, we might come across above cases. In order to let apps not be affected
and killed by NM, YARN admin could set a flag to tell NM wait for RM to come back and not
kill running containers. After RM start normally, clear the flag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message