hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dongtingting (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-5006) ResourceManager quit due to ApplicationStateData exceed the limit size of znode in zk
Date Thu, 28 Apr 2016 09:22:12 GMT

     [ https://issues.apache.org/jira/browse/YARN-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

dongtingting updated YARN-5006:
-------------------------------
    Description: 
Client submit a job, this job add 10000 file into DistributedCache. when the job is submitted,
ResourceManager sotre ApplicationStateData into zk. ApplicationStateData  is exceed the limit
size of znode. RM exit 1.   

The related code in RMStateStore.java :
{code}
  private static class StoreAppTransition
      implements SingleArcTransition<RMStateStore, RMStateStoreEvent> {
    @Override
    public void transition(RMStateStore store, RMStateStoreEvent event) {
      if (!(event instanceof RMStateStoreAppEvent)) {
        // should never happen
        LOG.error("Illegal event type: " + event.getClass());
        return;
      }
      ApplicationState appState = ((RMStateStoreAppEvent) event).getAppState();
      ApplicationId appId = appState.getAppId();
      ApplicationStateData appStateData = ApplicationStateData
          .newInstance(appState);
      LOG.info("Storing info for app: " + appId);
      try {  
        store.storeApplicationStateInternal(appId, appStateData);  //store the appStateData
        store.notifyApplication(new RMAppEvent(appId,
               RMAppEventType.APP_NEW_SAVED));
      } catch (Exception e) {
        LOG.error("Error storing app: " + appId, e);
        store.notifyStoreOperationFailed(e);   //handle fail event, system exit 
      }
    };
  }
{code}

The Exception log:
{code}
 ...
2016-04-20 11:26:35,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
AsyncDispatcher event handler: Maxed out ZK retries. Giving up!

2016-04-20 11:26:35,732 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore
AsyncDispatcher event handler: Error storing app: application_1461061795989_17671
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931)
        at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:936)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:933)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1075)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1096)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:933)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:947)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:956)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:626)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:138)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:123)
        at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:860)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:855)
        at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
        at java.lang.Thread.run(Thread.java:724)

   ...
2016-04-20 11:26:45,613 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
AsyncDispatcher event handler: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent
of type STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931)
        at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:936)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:933)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1075)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1096)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:933)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:947)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:956)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:626)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:138)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:123)
        at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore
.java:860)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:855)
        at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
        at java.lang.Thread.run(Thread.java:724)
2016-04-20 11:26:45,615 INFO org.apache.hadoop.util.ExitUtil AsyncDispatcher event handler:
Exiting with status 1
2016-04-20 11:26:45,622 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager
Thread[Thread-17,5,main]: ExpiredTokenRemover received java.lang.InterruptedException: sleep
interrupted
2016-04-20 11:26:45,623 INFO org.mortbay.log Thread-1: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@10.126.81.251:9088
2016-04-20 11:26:45,623 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager
Thread[Thread-21,5,main]: ExpiredTokenRemover received java.lang.InterruptedException: sleep
interrupted
2016-04-20 11:26:45,624 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager
Thread[Thread-19,5,main]: ExpiredTokenRemover received java.lang.InterruptedException: sleep
interrupted
2016-04-20 11:26:45,724 INFO org.apache.hadoop.ipc.Server Thread-1: Stopping server on 9033
2016-04-20 11:26:45,725 INFO org.apache.hadoop.ipc.Server IPC Server listener on 9033: Stopping
IPC Server listener on 9033
2016-04-20 11:26:45,725 INFO org.apache.hadoop.ha.ActiveStandbyElector Thread-1: Yielding
from election
2016-04-20 11:26:45,725 INFO org.apache.hadoop.ipc.Server IPC Server Responder: Stopping IPC
Server Responder
2016-04-20 11:26:45,725 INFO org.apache.hadoop.ha.ActiveStandbyElector Thread-1: Deleting
bread-crumb of active node...
2016-04-20 11:26:45,729 INFO org.apache.zookeeper.ZooKeeper Thread-1: Session: 0x2504c1df9409094
closed
2016-04-20 11:26:45,729 WARN org.apache.hadoop.ha.ActiveStandbyElector main-EventThread: Ignoring
stale result from old client with sessionId 0x2504c1df9409094
2016-04-20 11:26:45,729 INFO org.apache.zookeeper.ClientCnxn main-EventThread: EventThread
shut down

{code}


  was:
Client submit a job, this job add 10000 file into DistributedCache. when the job is submitted,
ResourceManager sotre ApplicationStateData inot zk. ApplicationStateData  is exceed the limit
size of znode. RM exit 1.  Exception log:
{code}
 ...
2016-04-20 11:26:35,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
AsyncDispatcher event handler: Maxed out ZK retries. Giving up!

2016-04-20 11:26:35,732 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore
AsyncDispatcher event handler: Error storing app: application_1461061795989_17671
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931)
        at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:936)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:933)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1075)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1096)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:933)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:947)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:956)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:626)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:138)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:123)
        at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:860)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:855)
        at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
        at java.lang.Thread.run(Thread.java:724)

   ...
2016-04-20 11:26:45,613 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
AsyncDispatcher event handler: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent
of type STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931)
        at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:936)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:933)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1075)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1096)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:933)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:947)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:956)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:626)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:138)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:123)
        at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore
.java:860)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:855)
        at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
        at java.lang.Thread.run(Thread.java:724)
2016-04-20 11:26:45,615 INFO org.apache.hadoop.util.ExitUtil AsyncDispatcher event handler:
Exiting with status 1
2016-04-20 11:26:45,622 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager
Thread[Thread-17,5,main]: ExpiredTokenRemover received java.lang.InterruptedException: sleep
interrupted
2016-04-20 11:26:45,623 INFO org.mortbay.log Thread-1: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@10.126.81.251:9088
2016-04-20 11:26:45,623 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager
Thread[Thread-21,5,main]: ExpiredTokenRemover received java.lang.InterruptedException: sleep
interrupted
2016-04-20 11:26:45,624 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager
Thread[Thread-19,5,main]: ExpiredTokenRemover received java.lang.InterruptedException: sleep
interrupted
2016-04-20 11:26:45,724 INFO org.apache.hadoop.ipc.Server Thread-1: Stopping server on 9033
2016-04-20 11:26:45,725 INFO org.apache.hadoop.ipc.Server IPC Server listener on 9033: Stopping
IPC Server listener on 9033
2016-04-20 11:26:45,725 INFO org.apache.hadoop.ha.ActiveStandbyElector Thread-1: Yielding
from election
2016-04-20 11:26:45,725 INFO org.apache.hadoop.ipc.Server IPC Server Responder: Stopping IPC
Server Responder
2016-04-20 11:26:45,725 INFO org.apache.hadoop.ha.ActiveStandbyElector Thread-1: Deleting
bread-crumb of active node...
2016-04-20 11:26:45,729 INFO org.apache.zookeeper.ZooKeeper Thread-1: Session: 0x2504c1df9409094
closed
2016-04-20 11:26:45,729 WARN org.apache.hadoop.ha.ActiveStandbyElector main-EventThread: Ignoring
stale result from old client with sessionId 0x2504c1df9409094
2016-04-20 11:26:45,729 INFO org.apache.zookeeper.ClientCnxn main-EventThread: EventThread
shut down

{code}



> ResourceManager quit due to ApplicationStateData exceed the limit  size of znode in zk
> --------------------------------------------------------------------------------------
>
>                 Key: YARN-5006
>                 URL: https://issues.apache.org/jira/browse/YARN-5006
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0, 2.7.2
>            Reporter: dongtingting
>            Priority: Critical
>
> Client submit a job, this job add 10000 file into DistributedCache. when the job is submitted,
ResourceManager sotre ApplicationStateData into zk. ApplicationStateData  is exceed the limit
size of znode. RM exit 1.   
> The related code in RMStateStore.java :
> {code}
>   private static class StoreAppTransition
>       implements SingleArcTransition<RMStateStore, RMStateStoreEvent> {
>     @Override
>     public void transition(RMStateStore store, RMStateStoreEvent event) {
>       if (!(event instanceof RMStateStoreAppEvent)) {
>         // should never happen
>         LOG.error("Illegal event type: " + event.getClass());
>         return;
>       }
>       ApplicationState appState = ((RMStateStoreAppEvent) event).getAppState();
>       ApplicationId appId = appState.getAppId();
>       ApplicationStateData appStateData = ApplicationStateData
>           .newInstance(appState);
>       LOG.info("Storing info for app: " + appId);
>       try {  
>         store.storeApplicationStateInternal(appId, appStateData);  //store the appStateData
>         store.notifyApplication(new RMAppEvent(appId,
>                RMAppEventType.APP_NEW_SAVED));
>       } catch (Exception e) {
>         LOG.error("Error storing app: " + appId, e);
>         store.notifyStoreOperationFailed(e);   //handle fail event, system exit 
>       }
>     };
>   }
> {code}
> The Exception log:
> {code}
>  ...
> 2016-04-20 11:26:35,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
AsyncDispatcher event handler: Maxed out ZK retries. Giving up!
> 2016-04-20 11:26:35,732 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore
AsyncDispatcher event handler: Error storing app: application_1461061795989_17671
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>         at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931)
>         at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:936)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:933)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1075)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1096)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:933)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:947)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:956)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:626)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:138)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:123)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:860)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:855)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>         at java.lang.Thread.run(Thread.java:724)
>    ...
> 2016-04-20 11:26:45,613 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
AsyncDispatcher event handler: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent
of type STATE_STORE_OP_FAILED. Cause:
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>         at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931)
>         at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:936)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:933)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1075)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1096)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:933)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:947)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:956)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:626)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:138)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:123)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore
> .java:860)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:855)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>         at java.lang.Thread.run(Thread.java:724)
> 2016-04-20 11:26:45,615 INFO org.apache.hadoop.util.ExitUtil AsyncDispatcher event handler:
Exiting with status 1
> 2016-04-20 11:26:45,622 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager
Thread[Thread-17,5,main]: ExpiredTokenRemover received java.lang.InterruptedException: sleep
interrupted
> 2016-04-20 11:26:45,623 INFO org.mortbay.log Thread-1: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@10.126.81.251:9088
> 2016-04-20 11:26:45,623 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager
Thread[Thread-21,5,main]: ExpiredTokenRemover received java.lang.InterruptedException: sleep
interrupted
> 2016-04-20 11:26:45,624 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager
Thread[Thread-19,5,main]: ExpiredTokenRemover received java.lang.InterruptedException: sleep
interrupted
> 2016-04-20 11:26:45,724 INFO org.apache.hadoop.ipc.Server Thread-1: Stopping server on
9033
> 2016-04-20 11:26:45,725 INFO org.apache.hadoop.ipc.Server IPC Server listener on 9033:
Stopping IPC Server listener on 9033
> 2016-04-20 11:26:45,725 INFO org.apache.hadoop.ha.ActiveStandbyElector Thread-1: Yielding
from election
> 2016-04-20 11:26:45,725 INFO org.apache.hadoop.ipc.Server IPC Server Responder: Stopping
IPC Server Responder
> 2016-04-20 11:26:45,725 INFO org.apache.hadoop.ha.ActiveStandbyElector Thread-1: Deleting
bread-crumb of active node...
> 2016-04-20 11:26:45,729 INFO org.apache.zookeeper.ZooKeeper Thread-1: Session: 0x2504c1df9409094
closed
> 2016-04-20 11:26:45,729 WARN org.apache.hadoop.ha.ActiveStandbyElector main-EventThread:
Ignoring stale result from old client with sessionId 0x2504c1df9409094
> 2016-04-20 11:26:45,729 INFO org.apache.zookeeper.ClientCnxn main-EventThread: EventThread
shut down
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message