hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunil G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4401) A failed app recovery should not prevent the RM from starting
Date Wed, 02 Dec 2015 09:01:10 GMT

    [ https://issues.apache.org/jira/browse/YARN-4401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035492#comment-15035492
] 

Sunil G commented on YARN-4401:
-------------------------------

Hi [~templedf]
I am not very sure about the use case here. However I feel if such a case occurs, we will
have enough information from logs to get the app-id.
Then we can use below command to clear such apps if necessary rather than forcefully clear
from rmcontext.
{noformat}
Usage: yarn resourcemanager [-format-state-store]
                            [-remove-application-from-state-store <appId>]
{noformat}

> A failed app recovery should not prevent the RM from starting
> -------------------------------------------------------------
>
>                 Key: YARN-4401
>                 URL: https://issues.apache.org/jira/browse/YARN-4401
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 2.7.1
>            Reporter: Daniel Templeton
>            Assignee: Daniel Templeton
>            Priority: Critical
>         Attachments: YARN-4401.001.patch
>
>
> There are many different reasons why an app recovery could fail with an exception, causing
the RM start to be aborted.  If that happens the RM will fail to start.  Presumably, the reason
the RM is trying to do a recovery is that it's the standby trying to fill in for the active.
 Failing to come up defeats the purpose of the HA configuration.  Instead of preventing the
RM from starting, a failed app recovery should log an error and skip the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message