hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Gupta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1924) RM shut down with RMFatalEvent of type STATE_STORE_OP_FAILED
Date Thu, 10 Apr 2014 21:38:25 GMT

    [ https://issues.apache.org/jira/browse/YARN-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965886#comment-13965886
] 

Arpit Gupta commented on YARN-1924:
-----------------------------------

Here is the stack trace.

{code}
cheduler from user hrt_qa in queue default
2014-04-10 09:19:35,907 INFO  attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(659))
- appattempt_1397121188061_0004_000002 State change from SUBMITTED to SCHEDULED
2014-04-10 09:19:36,095 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(639)) - application_1397121188061_0004
State change from ACCEPTED to KILLING
2014-04-10 09:19:36,096 INFO  attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:rememberTargetTransitionsAndStoreState(986))
- Updating application attempt appattempt_1397121188061_0004_000002 with final state: KILLED
2014-04-10 09:19:36,096 INFO  attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(659))
- appattempt_1397121188061_0004_000002 State change from SCHEDULED to FINAL_SAVING
2014-04-10 09:19:36,103 ERROR recovery.RMStateStore (RMStateStore.java:handleStoreEvent(681))
- Error storing appAttempt: appattempt_1397121188061_0004_000002
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
	at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945)
	at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:834)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:831)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:930)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:949)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:831)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:845)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:862)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:604)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
	at java.lang.Thread.run(Thread.java:662)
2014-04-10 09:19:36,107 FATAL resourcemanager.ResourceManager (ResourceManager.java:handle(657))
- Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED.
Cause:
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
	at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945)
	at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:834)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:831)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:930)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:949)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:831)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:845)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:862)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:604)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
	at java.lang.Thread.run(Thread.java:662)

2014-04-10 09:19:36,108 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with
status 1
{code}

> RM shut down with RMFatalEvent of type STATE_STORE_OP_FAILED
> ------------------------------------------------------------
>
>                 Key: YARN-1924
>                 URL: https://issues.apache.org/jira/browse/YARN-1924
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Arpit Gupta
>            Assignee: Jian He
>            Priority: Critical
>
> Noticed on a HA cluster Both RM shut down with this error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message