Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A4A0E18228 for ; Tue, 26 May 2015 08:18:19 +0000 (UTC) Received: (qmail 65936 invoked by uid 500); 26 May 2015 08:18:19 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 65891 invoked by uid 500); 26 May 2015 08:18:19 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 65879 invoked by uid 99); 26 May 2015 08:18:19 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 May 2015 08:18:19 +0000 Date: Tue, 26 May 2015 08:18:19 +0000 (UTC) From: "Raju Bairishetti (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (YARN-3644) Node manager shuts down if unable to connect with RM MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raju Bairishetti updated YARN-3644: ----------------------------------- Attachment: YARN-3644.patch Intorduced a new config **NODEMANAGER_SHUTSDWON_ON_RM_CONNECTION_FAILURES** to allow the users to take decision on the shutdown of the NM when it is not able to connect to RM. Keeping default value as true to honour the current behavior. > Node manager shuts down if unable to connect with RM > ---------------------------------------------------- > > Key: YARN-3644 > URL: https://issues.apache.org/jira/browse/YARN-3644 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Reporter: Srikanth Sundarrajan > Assignee: Raju Bairishetti > Attachments: YARN-3644.patch > > > When NM is unable to connect to RM, NM shuts itself down. > {code} > } catch (ConnectException e) { > //catch and throw the exception if tried MAX wait time to connect RM > dispatcher.getEventHandler().handle( > new NodeManagerEvent(NodeManagerEventType.SHUTDOWN)); > throw new YarnRuntimeException(e); > {code} > In large clusters, if RM is down for maintenance for longer period, all the NMs shuts themselves down, requiring additional work to bring up the NMs. > Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side effects, where non connection failures are being retried infinitely by all YarnClients (via RMProxy). -- This message was sent by Atlassian JIRA (v6.3.4#6332)