hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bibin A Chundatt (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8541) RM startup failure on recovery after user deletion
Date Mon, 23 Jul 2018 06:24:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552384#comment-16552384
] 

Bibin A Chundatt commented on YARN-8541:
----------------------------------------

Incase when queue is already provides, the app will get submitted to the queue specified ..
Exception will not be thrown. 

But that is expected and old behaviour too rt ??

> RM startup failure on recovery after user deletion
> --------------------------------------------------
>
>                 Key: YARN-8541
>                 URL: https://issues.apache.org/jira/browse/YARN-8541
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 3.1.0
>            Reporter: yimeng
>            Assignee: Bibin A Chundatt
>            Priority: Blocker
>         Attachments: YARN-8541.001.patch, YARN-8541.002.patch, YARN-8541.003.patch
>
>
> My hadoop version 3.1.0. I found that  a problem RM startup failure on recovery as the
follow test step:
> 1.create a user "user1" have the permisson to submit app.
> 2.use user1 to submit a job ,wait job finished.
> 3.delete user "user1"
> 4.restart yarn 
> 5.the RM restart failed
> RM logs:
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized root queue root: numChildQueue=
3, capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>usedCapacity=0.0,
numApps=0, numContainers=0 | CapacitySchedulerQueueManager.java:163
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized queue mappings, override:
false | UserGroupMappingPlacementRule.java:232
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized CapacityScheduler with
calculator=class org.apache.hadoop.yarn.util.resource.DominantResourceCalculator, minimumAllocation=<<memory:512,
vCores:1>>, maximumAllocation=<<memory:65536, vCores:32>>, asynchronousScheduling=false,
asyncScheduleInterval=5ms | CapacityScheduler.java:392
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | dynamic-resources.xml not found |
Configuration.java:2767
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | Initializing AMS Processing chain.
Root Processor=[org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor]. | AMSProcessingChain.java:62
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | disabled placement handler will be
used, all scheduling requests will be rejected. | ApplicationMasterService.java:130
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | Adding [org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor]
tp top of AMS Processing chain. | AMSProcessingChain.java:75
> 2018-07-16 16:24:59,713 | WARN | main-EventThread | Exception handling the winning of
election | ActiveStandbyElector.java:897
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
>  at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>  at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:893)
>  at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
>  at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active
mode
>  at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325)
>  at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.yarn.exceptions.YarnException:
Failed to submit application application_1531624956005_0001 submitted by user super reason:
No groups found for user super
>  at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
>  at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1204)
>  at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1245)
>  at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1241)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>  at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1241)
>  at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  ... 5 more
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application
application_1531624956005_0001 submitted by user super reason: No groups found for user super
>  at org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getPlacementForApp(UserGroupMappingPlacementRule.java:206)
>  at org.apache.hadoop.yarn.server.resourcemanager.placement.PlacementManager.placeApplication(PlacementManager.java:68)
>  at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.placeApplication(RMAppManager.java:798)
>  at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:369)
>  at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:357)
>  at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:568)
>  at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1455)
>  at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:828)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  ... 13 more
> 2018-07-16 16:24:59,713 | INFO | main-EventThread | Trying to re-establish ZK session
| ActiveStandbyElector.java:746
> 2018-07-16 16:24:59,715 | INFO | main-EventThread | Session: 0x1100001cdf8c2ea7 closed
| ZooKeeper.java:1325
> 2018-07-16 16:25:00,716 | INFO | main-EventThread | Initiating client connection, connectString=187-4-64-187:24002,187-4-64-119:24002,187-4-64-248:24002
sessionTimeout=45000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@62f6291c
| ZooKeeper.java:861
> 2018-07-16 16:25:00,716 | INFO | main-EventThread | zookeeper.request.timeout configured
value is 120000. | ClientCnxn.java:141
> 2018-07-16 16:25:00,716 | INFO | main-EventThread | zookeeper.client.bind.port.range
is not configured. | ClientCnxn.java:177



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message