Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AC58518B14 for ; Mon, 15 Jun 2015 20:18:01 +0000 (UTC) Received: (qmail 54519 invoked by uid 500); 15 Jun 2015 20:18:01 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 54479 invoked by uid 500); 15 Jun 2015 20:18:01 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 54466 invoked by uid 99); 15 Jun 2015 20:18:01 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Jun 2015 20:18:01 +0000 Date: Mon, 15 Jun 2015 20:18:01 +0000 (UTC) From: "Jian He (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-3804) Both RM are on standBy state when kerberos user not in yarn.admin.acl MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-3804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586610#comment-14586610 ] Jian He commented on YARN-3804: ------------------------------- +1 for 2) Not too much point having RM to depend on the admin acl to do transition for itself. [~kasha], [~xgong], sounds good ? > Both RM are on standBy state when kerberos user not in yarn.admin.acl > --------------------------------------------------------------------- > > Key: YARN-3804 > URL: https://issues.apache.org/jira/browse/YARN-3804 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Environment: Suse 11 Sp3, 2 RM, Secure > Reporter: Bibin A Chundatt > Assignee: Varun Saxena > Priority: Critical > > Steps to reproduce > ================ > 1. Configure cluster in secure mode > 2. On RM Configure yarn.admin.acl=dsperf > 3. Configure in arn.resourcemanager.principal=yarn > 4. Start Both RM > Both RM will be in Standby forever > {code} > 2015-06-15 12:20:21,556 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn OPERATION=refreshAdminAcls TARGET=AdminService RESULT=FAILURE DESCRIPTION=Unauthorized userPERMISSIONS= > 2015-06-15 12:20:21,556 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128) > at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824) > at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420) > at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:645) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:518) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Can not execute refreshAdminAcls > at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297) > at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) > ... 4 more > Caused by: org.apache.hadoop.yarn.exceptions.YarnException: org.apache.hadoop.security.AccessControlException: User yarn doesn't have permission to call 'refreshAdminAcls' > at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38) > at org.apache.hadoop.yarn.server.resourcemanager.AdminService.checkAcls(AdminService.java:230) > at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAdminAcls(AdminService.java:465) > at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:295) > ... 5 more > Caused by: org.apache.hadoop.security.AccessControlException: User yarn doesn't have permission to call 'refreshAdminAcls' > at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.verifyAdminAccess(RMServerUtils.java:182) > at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.verifyAdminAccess(RMServerUtils.java:148) > at org.apache.hadoop.yarn.server.resourcemanager.AdminService.checkAccess(AdminService.java:223) > at org.apache.hadoop.yarn.server.resourcemanager.AdminService.checkAcls(AdminService.java:228) > ... 7 more > {code} > *Analysis* > On each RM attempt to switch to Active refreshACl is called and acl permission not available for the user > Infinite retry for the same switch to Active and always false returned from > {{ActiveStandbyElector#becomeActive()}} > > *Expected* > RM should get shutdown event after few retry or even at first attempt > Since at runtime user from which it retries for refreshacl can never be updated. > *States from commands* > ./yarn rmadmin -getServiceState rm2 > *standby* > ./yarn rmadmin -getServiceState rm1 > *standby* > ./yarn rmadmin -checkHealth rm1 > *echo $? = 0* > ./yarn rmadmin -checkHealth rm2 > *echo $? = 0* -- This message was sent by Atlassian JIRA (v6.3.4#6332)