hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naganarasimha G R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
Date Thu, 27 Aug 2015 12:03:46 GMT

    [ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716543#comment-14716543
] 

Naganarasimha G R commented on YARN-3893:
-----------------------------------------

Hi [~bibinchundatt],
Thanks for the patch, test cases ran fine, approach and test case seems to be fine but few
comments from my side 
# timeout of 900000 is on the higher side is that much req or was it for local testing ?
# instead of test case in RMHA can we think of adding it to TestRMAdminService as the failure
is related to transition to Active ? 
# May be while throwing RMFatalEvent better to wrap it with another exception wrapping the
existing one and with the message that transition to active failed so that RM Logs have clear
information on what operation it exited. or may be eventType instead of having {{ACTIVE_REFRESH_FAIL}}
we can have more intuitive name {{TRANSITION_TO_ACTIVE_FAILED}}

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> ------------------------------------------------------------------------------
>
>                 Key: YARN-3893
>                 URL: https://issues.apache.org/jira/browse/YARN-3893
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.7.1
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>         Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 0003-YARN-3893.patch,
0004-YARN-3893.patch, 0005-YARN-3893.patch, 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch,
yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for
your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for
your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message