hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandru Pacurar <Alexandru.Pacu...@PropertyShark.com>
Subject ResourceManager fails to start
Date Fri, 26 Jun 2015 13:10:55 GMT
Hello,

I'm running Hadoop 2.6 and I have encountered a problem with the resourcemanager. After a
restart the resourcemanager refuses to start with the following error:

2015-06-26 08:54:10,342 INFO  attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:recover(796))
- Recovering attempt: appattempt_1435159945366_0792_000001 with final state: null
2015-06-26 08:54:10,342 INFO  security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:createAndGetAMRMToken(195))
- Create AMRMToken for ApplicationAttempt: appattempt_1435159945366_0792_000001
2015-06-26 08:54:10,342 INFO  security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:createPassword(307))
- Creating password for appattempt_1435159945366_0792_000001
2015-06-26 08:54:10,343 INFO  resourcemanager.ApplicationMasterService (ApplicationMasterService.java:registerAppAttempt(670))
- Registering app attempt : appattempt_1435159945366_0792_000001
2015-06-26 08:54:10,344 ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart(594))
- Failed to load/recover state
java.lang.NullPointerException
                at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:734)
                at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1089)
                at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1038)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1002)
                at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
                at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
                at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
                at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:831)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:101)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:846)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:836)
                at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
                at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
                at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
                at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:711)
                at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:312)
                at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:413)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1207)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:590)
                at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1014)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1051)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1047)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:415)
                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1047)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1091)
                at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
2015-06-26 08:54:10,348 INFO  service.AbstractService (AbstractService.java:noteFailure(272))
- Service RMActiveServices failed in state STARTED; cause: java.lang.NullPointerException
java.lang.NullPointerException
                at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:734)
                at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1089)
                at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1038)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1002)
                at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
                at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
                at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
                at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:831)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:101)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:846)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:836)
                at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
                at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
                at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
                at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:711)
                at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:312)
                at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:413)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1207)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:590)
                at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1014)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1051)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1047)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:415)
                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1047)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1091)
                at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
2015-06-26 08:54:10,350 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(210)) -
Stopping ResourceManager metrics system...
2015-06-26 08:54:10,417 INFO  impl.MetricsSinkAdapter (MetricsSinkAdapter.java:publishMetricsFromQueue(135))
- timeline thread interrupted.
2015-06-26 08:54:10,419 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(216)) -
ResourceManager metrics system stopped.
2015-06-26 08:54:10,420 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(605))
- ResourceManager metrics system shutdown complete.
2015-06-26 08:54:10,437 INFO  zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x34e2fddab0e0001
closed
2015-06-26 08:54:10,437 INFO  event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(138))
- AsyncDispatcher is draining to stop, igonring any new events.
2015-06-26 08:54:10,437 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(512)) - EventThread
shut down
2015-06-26 08:54:10,439 INFO  event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(138))
- AsyncDispatcher is draining to stop, igonring any new events.
2015-06-26 08:54:10,439 INFO  service.AbstractService (AbstractService.java:noteFailure(272))
- Service Dispatcher failed in state STOPPED; cause: java.lang.NullPointerException
java.lang.NullPointerException

After some searching I've discovered that the yarn.resourcemanager.store.class property controls
the state of the ResourceManager and my value is org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
so I have the state in zookeeper.
My question is, should I just remove appattempt_1435159945366_0792_000001 (and any other attempts)
from zookeeper in order to have my resourcemanager up, or is there a way to make it skip specific
attempts, or maybe I could just recreate the state store form zero since I don't kare about
the running application, and I waold just like to have the ResourceManager service up.

Thank you,
Alex

Mime
View raw message