Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B22671818B for ; Tue, 25 Aug 2015 12:48:51 +0000 (UTC) Received: (qmail 16141 invoked by uid 500); 25 Aug 2015 12:48:46 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 16092 invoked by uid 500); 25 Aug 2015 12:48:46 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 16076 invoked by uid 99); 25 Aug 2015 12:48:46 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Aug 2015 12:48:46 +0000 Date: Tue, 25 Aug 2015 12:48:46 +0000 (UTC) From: "Rohith Sharma K S (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll() MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711201#comment-14711201 ] Rohith Sharma K S commented on YARN-3893: ----------------------------------------- There are 2 type of refresh can happen i.e. 1. yarn-site.xml refresh, 2. scheduler configurations refresh. Schduler configurations are reloaded for every service initialization which is by design. If any issue in the scheduler configuration, fail-fast configuraton behavior work as same for both true and false. Fail-fast configuration is useful when admin do mistake in configuring mistake in yarn-site.xml. With wrong configuration in yarn-site.xml, RM service can be up whereas with wrong Scheduler configuration , service can NOT be up at all. *On best effort basis for make service up*, handling exception for yarn-site.xml and scheduler configuration are different. BTW, making RM state StandBy would lead to filling up of the logs very soon because of elector continuous try to make active. Any configuration issue, better to exit the JVM and notify admin that RM is down so that admin can check the logs and identify it. > Both RM in active state when Admin#transitionToActive failure from refeshAll() > ------------------------------------------------------------------------------ > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Affects Versions: 2.7.1 > Reporter: Bibin A Chundatt > Assignee: Bibin A Chundatt > Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)