Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0B217200D66 for ; Fri, 29 Dec 2017 08:10:19 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 02C70160C34; Fri, 29 Dec 2017 07:10:19 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id F049A160C22 for ; Fri, 29 Dec 2017 08:10:17 +0100 (CET) Received: (qmail 23586 invoked by uid 500); 29 Dec 2017 07:10:16 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 23575 invoked by uid 99); 29 Dec 2017 07:10:16 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Dec 2017 07:10:16 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 252EC1808BE for ; Fri, 29 Dec 2017 07:10:16 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_NONE=-0.0001, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id vee_9RX-46BO for ; Fri, 29 Dec 2017 07:10:14 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id DF8125FBCB for ; Fri, 29 Dec 2017 07:10:12 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 80564E2586 for ; Fri, 29 Dec 2017 07:10:10 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2ABF124106 for ; Fri, 29 Dec 2017 07:10:05 +0000 (UTC) Date: Fri, 29 Dec 2017 07:10:05 +0000 (UTC) From: "Vinod Kumar Vavilapalli (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (YARN-7692) Resource Manager goes down when a user not included in a priority acl submits a job MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 29 Dec 2017 07:10:19 -0000 [ https://issues.apache.org/jira/browse/YARN-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-7692: ------------------------------------------ Priority: Blocker (was: Major) Target Version/s: 3.1.0, 2.9.1, 3.0.1 Marking as a blocker for 2.9.1, 3.0.1 and 3.1.0. > Resource Manager goes down when a user not included in a priority acl submits a job > ----------------------------------------------------------------------------------- > > Key: YARN-7692 > URL: https://issues.apache.org/jira/browse/YARN-7692 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.9.0, 2.8.3, 3.0.0 > Reporter: Charan Hebri > Assignee: Sunil G > Priority: Blocker > > Test scenario > ------------------ > 1. A cluster is created, no ACLs are included > 2. Submit jobs with an existing user say 'user_a' > 3. Enable ACLs and create a priority ACL entry via the property yarn.scheduler.capacity.priority-acls. Do not include the user, 'user_a' in this ACL. > 4. Submit a job with the 'user_a' > The observed behavior in this case is that the job is rejected as 'user_a' does not have the permission to run the job which is expected behavior. But Resource Manager also goes down when it tries to recover previous applications and fails to recover them. > Below is the exception seen, > {noformat} > 2017-12-27 10:52:30,064 INFO conf.Configuration (Configuration.java:getConfResourceAsInputStream(2659)) - found resource yarn-site.xml at file:/etc/hadoop/3.0.0.0-636/0/yarn-site.xml > 2017-12-27 10:52:30,065 INFO scheduler.AbstractYarnScheduler (AbstractYarnScheduler.java:setClusterMaxPriority(911)) - Updated the cluste max priority to maxClusterLevelAppPriority = 10 > 2017-12-27 10:52:30,066 INFO resourcemanager.ResourceManager (ResourceManager.java:transitionToActive(1177)) - Transitioning to active state > 2017-12-27 10:52:30,097 INFO resourcemanager.ResourceManager (ResourceManager.java:serviceStart(765)) - Recovery started > 2017-12-27 10:52:30,102 INFO recovery.RMStateStore (RMStateStore.java:checkVersion(747)) - Loaded RM state version info 1.5 > 2017-12-27 10:52:30,375 INFO security.RMDelegationTokenSecretManager (RMDelegationTokenSecretManager.java:recover(196)) - recovering RMDelegationTokenSecretManager. > 2017-12-27 10:52:30,380 INFO resourcemanager.RMAppManager (RMAppManager.java:recover(561)) - Recovering 51 applications > 2017-12-27 10:52:30,432 INFO resourcemanager.RMAppManager (RMAppManager.java:recover(571)) - Successfully recovered 0 out of 51 applications > 2017-12-27 10:52:30,432 ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart(776)) - Failed to load/recover state > org.apache.hadoop.yarn.exceptions.YarnException: org.apache.hadoop.security.AccessControlException: User hrt_qa (auth:SIMPLE) does not have permission to submit/update application_1514268754125_0001 for 0 > at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2348) > at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:396) > at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:358) > at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:567) > at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1390) > at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:771) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1143) > at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1183) > at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1179) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) > at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1179) > at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894) > at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473) > at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510) > Caused by: org.apache.hadoop.security.AccessControlException: User hrt_qa (auth:SIMPLE) does not have permission to submit/update application_1514268754125_0001 for 0 > ... 20 more > 2017-12-27 10:52:30,434 INFO service.AbstractService (AbstractService.java:noteFailure(273)) - Service RMActiveServices failed in state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnException: org.apache.hadoop.security.AccessControlException: User hrt_qa (auth:SIMPLE) does not have permission to submit/update application_1514268754125_0001 for 0 > org.apache.hadoop.yarn.exceptions.YarnException: org.apache.hadoop.security.AccessControlException: User hrt_qa (auth:SIMPLE) does not have permission to submit/update application_1514268754125_0001 for 0 > at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2348) > at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:396) > at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:358) > at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:567) > at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1390) > at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:771) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1143) > at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1183) > at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1179) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) > at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1179) > at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894) > at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473) > at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510) > Caused by: org.apache.hadoop.security.AccessControlException: User hrt_qa (auth:SIMPLE) does not have permission to submit/update application_1514268754125_0001 for 0 > ... 20 more > 2017-12-27 10:52:30,435 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(210)) - Stopping ResourceManager metrics system... > 2017-12-27 10:52:30,435 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(216)) - ResourceManager metrics system stopped. > 2017-12-27 10:52:30,436 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(607)) - ResourceManager metrics system shutdown complete. > 2017-12-27 10:52:30,436 INFO event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(155)) - AsyncDispatcher is draining to stop, ignoring any new events. > 2017-12-27 10:52:30,437 INFO event.AsyncDispatcher (AsyncDispatcher.java:register(223)) - Registering class org.apache.hadoop.yarn.server.resourcemanager.RMFatalEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMFatalEventDispatcher > 2017-12-27 10:52:30,438 INFO security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:(75)) - NMTokenKeyRollingInterval: 86400000ms and NMTokenKeyActivationDelay: 900000ms > 2017-12-27 10:52:30,438 INFO security.RMContainerTokenSecretManager (RMContainerTokenSecretManager.java:(79)) - ContainerTokenKeyRollingInterval: 86400000ms and ContainerTokenKeyActivationDelay: 900000ms > 2017-12-27 10:52:30,438 INFO security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:(94)) - AMRMTokenKeyRollingInterval: 86400000ms and AMRMTokenKeyActivationDelay: 900000 ms > 2017-12-27 10:52:30,439 INFO recovery.RMStateStoreFactory (RMStateStoreFactory.java:getStore(33)) - Using RMStateStore implementation - class org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore > 2017-12-27 10:52:30,439 INFO event.AsyncDispatcher (AsyncDispatcher.java:register(223)) - Registering class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType for class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler > 2017-12-27 10:52:30,439 WARN curator.CuratorZookeeperClient (CuratorZookeeperClient.java:(96)) - session timeout [10000] is less than connection timeout [15000] > 2017-12-27 10:52:30,440 INFO imps.CuratorFrameworkImpl (CuratorFrameworkImpl.java:start(235)) - Starting > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org