hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith Sharma K S (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
Date Tue, 25 Aug 2015 12:48:46 GMT

    [ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711201#comment-14711201
] 

Rohith Sharma K S commented on YARN-3893:
-----------------------------------------

There are 2 type of refresh can happen i.e. 1. yarn-site.xml refresh, 2. scheduler configurations
refresh. Schduler configurations are reloaded for every service initialization which is by
design. If any issue in the scheduler configuration, fail-fast configuraton behavior work
as same for both true and false. Fail-fast configuration is useful when admin do mistake in
configuring mistake in yarn-site.xml. With wrong configuration in yarn-site.xml, RM service
can be  up whereas with wrong Scheduler configuration , service can NOT be up at all. *On
best effort  basis for make service up*, handling exception for yarn-site.xml and scheduler
configuration are different.

BTW, making RM state StandBy would lead to filling up of the logs very soon because of elector
continuous try to make active. Any configuration issue, better to exit the JVM and notify
admin that RM is down so that admin can check the logs and identify it.

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> ------------------------------------------------------------------------------
>
>                 Key: YARN-3893
>                 URL: https://issues.apache.org/jira/browse/YARN-3893
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.7.1
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>         Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 0003-YARN-3893.patch,
0004-YARN-3893.patch, 0005-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for
your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for
your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message