Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Tue, 25 Aug 2015 12:48:46 +0000 (UTC)
From: "Rohith Sharma K S (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12843201.1436278782000.158730.1440506926462@Atlassian.JIRA>
In-Reply-To: <JIRA.12843201.1436278782000@Atlassian.JIRA>
References: <JIRA.12843201.1436278782000@Atlassian.JIRA>
 <JIRA.12843201.1436278782391@arcas>
Subject: [jira] [Commented] (YARN-3893) Both RM in active state when
 Admin#transitionToActive failure from refeshAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711201#comment-14711201 ] 

Rohith Sharma K S commented on YARN-3893:
-----------------------------------------

There are 2 type of refresh can happen i.e. 1. yarn-site.xml refresh, 2. scheduler configurations refresh. Schduler configurations are reloaded for every service initialization which is by design. If any issue in the scheduler configuration, fail-fast configuraton behavior work as same for both true and false. Fail-fast configuration is useful when admin do mistake in configuring mistake in yarn-site.xml. With wrong configuration in yarn-site.xml, RM service can be  up whereas with wrong Scheduler configuration , service can NOT be up at all. *On best effort  basis for make service up*, handling exception for yarn-site.xml and scheduler configuration are different.

BTW, making RM state StandBy would lead to filling up of the logs very soon because of elector continuous try to make active. Any configuration issue, better to exit the JVM and notify admin that RM is down so that admin can check the logs and identify it.

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> ------------------------------------------------------------------------------
>
>                 Key: YARN-3893
>                 URL: https://issues.apache.org/jira/browse/YARN-3893
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.7.1
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>         Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)