hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raju Bairishetti (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-3644) Node manager shuts down if unable to connect with RM
Date Tue, 26 May 2015 08:18:19 GMT

     [ https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Raju Bairishetti updated YARN-3644:
-----------------------------------
    Attachment: YARN-3644.patch

Intorduced a new config **NODEMANAGER_SHUTSDWON_ON_RM_CONNECTION_FAILURES** to allow the users
to take decision on the shutdown of the NM when it is not able to connect to RM.

Keeping default value as true to honour the current behavior.

> Node manager shuts down if unable to connect with RM
> ----------------------------------------------------
>
>                 Key: YARN-3644
>                 URL: https://issues.apache.org/jira/browse/YARN-3644
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Srikanth Sundarrajan
>            Assignee: Raju Bairishetti
>         Attachments: YARN-3644.patch
>
>
> When NM is unable to connect to RM, NM shuts itself down.
> {code}
>           } catch (ConnectException e) {
>             //catch and throw the exception if tried MAX wait time to connect RM
>             dispatcher.getEventHandler().handle(
>                 new NodeManagerEvent(NodeManagerEventType.SHUTDOWN));
>             throw new YarnRuntimeException(e);
> {code}
> In large clusters, if RM is down for maintenance for longer period, all the NMs shuts
themselves down, requiring additional work to bring up the NMs.
> Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side effects, where
non connection failures are being retried infinitely by all YarnClients (via RMProxy).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message