hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2834) Resource manager crashed with Null Pointer Exception
Date Sun, 09 Nov 2014 20:31:33 GMT

    [ https://issues.apache.org/jira/browse/YARN-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14204101#comment-14204101
] 

Jian He commented on YARN-2834:
-------------------------------

The reason RM fails with NPE is that now if app fails to renew the token, it'll move to FAILED
state and save the app final state. But we didn't fail the RMAppAttempt and save the attempt
final state. So, on recovery, app won't be added to scheduler because app is already in final
state, but it'll still try to add the attempt to scheduler which causes the NPE.

> Resource manager crashed with Null Pointer Exception
> ----------------------------------------------------
>
>                 Key: YARN-2834
>                 URL: https://issues.apache.org/jira/browse/YARN-2834
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Yesha Vora
>            Assignee: Jian He
>
> Resource manager failed after restart. 
> {noformat}
> 2014-11-09 04:12:53,013 INFO  capacity.CapacityScheduler (CapacityScheduler.java:initializeQueues(467))
- Initialized root queue root: numChildQueue= 2, capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0,
vCores:0>usedCapacity=0.0, numApps=0, numContainers=0
> 2014-11-09 04:12:53,013 INFO  capacity.CapacityScheduler (CapacityScheduler.java:initializeQueueMappings(436))
- Initialized queue mappings, override: false
> 2014-11-09 04:12:53,013 INFO  capacity.CapacityScheduler (CapacityScheduler.java:initScheduler(305))
- Initialized CapacityScheduler with calculator=class org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator,
minimumAllocation=<<memory:256, vCores:1>>, maximumAllocation=<<memory:2048,
vCores:32>>, asynchronousScheduling=false, asyncScheduleInterval=5ms
> 2014-11-09 04:12:53,015 INFO  service.AbstractService (AbstractService.java:noteFailure(272))
- Service ResourceManager failed in state STARTED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:734)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1089)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1041)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1005)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:821)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:101)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:843)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:826)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:701)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:312)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:413)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1207)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:590)
>         at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1014)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1051)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1047)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1047)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1091)
>         at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message