Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4383E19783 for ; Thu, 28 Apr 2016 09:22:13 +0000 (UTC) Received: (qmail 14080 invoked by uid 500); 28 Apr 2016 09:22:13 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 13935 invoked by uid 500); 28 Apr 2016 09:22:13 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 13903 invoked by uid 99); 28 Apr 2016 09:22:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Apr 2016 09:22:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id E34502C0451 for ; Thu, 28 Apr 2016 09:22:12 +0000 (UTC) Date: Thu, 28 Apr 2016 09:22:12 +0000 (UTC) From: "dongtingting (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (YARN-5006) ResourceManager quit due to ApplicationStateData exceed the limit size of znode in zk MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dongtingting updated YARN-5006: ------------------------------- Description: Client submit a job, this job add 10000 file into DistributedCache. when the job is submitted, ResourceManager sotre ApplicationStateData into zk. ApplicationStateData is exceed the limit size of znode. RM exit 1. The related code in RMStateStore.java : {code} private static class StoreAppTransition implements SingleArcTransition { @Override public void transition(RMStateStore store, RMStateStoreEvent event) { if (!(event instanceof RMStateStoreAppEvent)) { // should never happen LOG.error("Illegal event type: " + event.getClass()); return; } ApplicationState appState = ((RMStateStoreAppEvent) event).getAppState(); ApplicationId appId = appState.getAppId(); ApplicationStateData appStateData = ApplicationStateData .newInstance(appState); LOG.info("Storing info for app: " + appId); try { store.storeApplicationStateInternal(appId, appStateData); //store the appStateData store.notifyApplication(new RMAppEvent(appId, RMAppEventType.APP_NEW_SAVED)); } catch (Exception e) { LOG.error("Error storing app: " + appId, e); store.notifyStoreOperationFailed(e); //handle fail event, system exit } }; } {code} The Exception log: {code} ... 2016-04-20 11:26:35,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore AsyncDispatcher event handler: Maxed out ZK retries. Giving up! 2016-04-20 11:26:35,732 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore AsyncDispatcher event handler: Error storing app: application_1461061795989_17671 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:936) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:933) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1075) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1096) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:933) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:947) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:956) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:626) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:138) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:123) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:860) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:855) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:724) ... 2016-04-20 11:26:45,613 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager AsyncDispatcher event handler: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:936) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:933) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1075) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1096) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:933) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:947) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:956) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:626) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:138) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:123) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore .java:860) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:855) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:724) 2016-04-20 11:26:45,615 INFO org.apache.hadoop.util.ExitUtil AsyncDispatcher event handler: Exiting with status 1 2016-04-20 11:26:45,622 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager Thread[Thread-17,5,main]: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted 2016-04-20 11:26:45,623 INFO org.mortbay.log Thread-1: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@10.126.81.251:9088 2016-04-20 11:26:45,623 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager Thread[Thread-21,5,main]: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted 2016-04-20 11:26:45,624 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager Thread[Thread-19,5,main]: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted 2016-04-20 11:26:45,724 INFO org.apache.hadoop.ipc.Server Thread-1: Stopping server on 9033 2016-04-20 11:26:45,725 INFO org.apache.hadoop.ipc.Server IPC Server listener on 9033: Stopping IPC Server listener on 9033 2016-04-20 11:26:45,725 INFO org.apache.hadoop.ha.ActiveStandbyElector Thread-1: Yielding from election 2016-04-20 11:26:45,725 INFO org.apache.hadoop.ipc.Server IPC Server Responder: Stopping IPC Server Responder 2016-04-20 11:26:45,725 INFO org.apache.hadoop.ha.ActiveStandbyElector Thread-1: Deleting bread-crumb of active node... 2016-04-20 11:26:45,729 INFO org.apache.zookeeper.ZooKeeper Thread-1: Session: 0x2504c1df9409094 closed 2016-04-20 11:26:45,729 WARN org.apache.hadoop.ha.ActiveStandbyElector main-EventThread: Ignoring stale result from old client with sessionId 0x2504c1df9409094 2016-04-20 11:26:45,729 INFO org.apache.zookeeper.ClientCnxn main-EventThread: EventThread shut down {code} was: Client submit a job, this job add 10000 file into DistributedCache. when the job is submitted, ResourceManager sotre ApplicationStateData inot zk. ApplicationStateData is exceed the limit size of znode. RM exit 1. Exception log: {code} ... 2016-04-20 11:26:35,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore AsyncDispatcher event handler: Maxed out ZK retries. Giving up! 2016-04-20 11:26:35,732 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore AsyncDispatcher event handler: Error storing app: application_1461061795989_17671 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:936) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:933) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1075) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1096) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:933) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:947) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:956) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:626) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:138) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:123) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:860) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:855) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:724) ... 2016-04-20 11:26:45,613 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager AsyncDispatcher event handler: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:936) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:933) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1075) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1096) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:933) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:947) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:956) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:626) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:138) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:123) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore .java:860) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:855) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:724) 2016-04-20 11:26:45,615 INFO org.apache.hadoop.util.ExitUtil AsyncDispatcher event handler: Exiting with status 1 2016-04-20 11:26:45,622 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager Thread[Thread-17,5,main]: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted 2016-04-20 11:26:45,623 INFO org.mortbay.log Thread-1: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@10.126.81.251:9088 2016-04-20 11:26:45,623 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager Thread[Thread-21,5,main]: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted 2016-04-20 11:26:45,624 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager Thread[Thread-19,5,main]: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted 2016-04-20 11:26:45,724 INFO org.apache.hadoop.ipc.Server Thread-1: Stopping server on 9033 2016-04-20 11:26:45,725 INFO org.apache.hadoop.ipc.Server IPC Server listener on 9033: Stopping IPC Server listener on 9033 2016-04-20 11:26:45,725 INFO org.apache.hadoop.ha.ActiveStandbyElector Thread-1: Yielding from election 2016-04-20 11:26:45,725 INFO org.apache.hadoop.ipc.Server IPC Server Responder: Stopping IPC Server Responder 2016-04-20 11:26:45,725 INFO org.apache.hadoop.ha.ActiveStandbyElector Thread-1: Deleting bread-crumb of active node... 2016-04-20 11:26:45,729 INFO org.apache.zookeeper.ZooKeeper Thread-1: Session: 0x2504c1df9409094 closed 2016-04-20 11:26:45,729 WARN org.apache.hadoop.ha.ActiveStandbyElector main-EventThread: Ignoring stale result from old client with sessionId 0x2504c1df9409094 2016-04-20 11:26:45,729 INFO org.apache.zookeeper.ClientCnxn main-EventThread: EventThread shut down {code} > ResourceManager quit due to ApplicationStateData exceed the limit size of znode in zk > -------------------------------------------------------------------------------------- > > Key: YARN-5006 > URL: https://issues.apache.org/jira/browse/YARN-5006 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.6.0, 2.7.2 > Reporter: dongtingting > Priority: Critical > > Client submit a job, this job add 10000 file into DistributedCache. when the job is submitted, ResourceManager sotre ApplicationStateData into zk. ApplicationStateData is exceed the limit size of znode. RM exit 1. > The related code in RMStateStore.java : > {code} > private static class StoreAppTransition > implements SingleArcTransition { > @Override > public void transition(RMStateStore store, RMStateStoreEvent event) { > if (!(event instanceof RMStateStoreAppEvent)) { > // should never happen > LOG.error("Illegal event type: " + event.getClass()); > return; > } > ApplicationState appState = ((RMStateStoreAppEvent) event).getAppState(); > ApplicationId appId = appState.getAppId(); > ApplicationStateData appStateData = ApplicationStateData > .newInstance(appState); > LOG.info("Storing info for app: " + appId); > try { > store.storeApplicationStateInternal(appId, appStateData); //store the appStateData > store.notifyApplication(new RMAppEvent(appId, > RMAppEventType.APP_NEW_SAVED)); > } catch (Exception e) { > LOG.error("Error storing app: " + appId, e); > store.notifyStoreOperationFailed(e); //handle fail event, system exit > } > }; > } > {code} > The Exception log: > {code} > ... > 2016-04-20 11:26:35,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore AsyncDispatcher event handler: Maxed out ZK retries. Giving up! > 2016-04-20 11:26:35,732 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore AsyncDispatcher event handler: Error storing app: application_1461061795989_17671 > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:936) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:933) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1075) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1096) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:933) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:947) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:956) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:626) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:138) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:123) > at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:860) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:855) > at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:724) > ... > 2016-04-20 11:26:45,613 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager AsyncDispatcher event handler: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:936) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:933) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1075) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1096) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:933) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:947) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:956) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:626) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:138) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:123) > at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore > .java:860) > at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:855) > at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:724) > 2016-04-20 11:26:45,615 INFO org.apache.hadoop.util.ExitUtil AsyncDispatcher event handler: Exiting with status 1 > 2016-04-20 11:26:45,622 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager Thread[Thread-17,5,main]: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted > 2016-04-20 11:26:45,623 INFO org.mortbay.log Thread-1: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@10.126.81.251:9088 > 2016-04-20 11:26:45,623 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager Thread[Thread-21,5,main]: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted > 2016-04-20 11:26:45,624 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager Thread[Thread-19,5,main]: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted > 2016-04-20 11:26:45,724 INFO org.apache.hadoop.ipc.Server Thread-1: Stopping server on 9033 > 2016-04-20 11:26:45,725 INFO org.apache.hadoop.ipc.Server IPC Server listener on 9033: Stopping IPC Server listener on 9033 > 2016-04-20 11:26:45,725 INFO org.apache.hadoop.ha.ActiveStandbyElector Thread-1: Yielding from election > 2016-04-20 11:26:45,725 INFO org.apache.hadoop.ipc.Server IPC Server Responder: Stopping IPC Server Responder > 2016-04-20 11:26:45,725 INFO org.apache.hadoop.ha.ActiveStandbyElector Thread-1: Deleting bread-crumb of active node... > 2016-04-20 11:26:45,729 INFO org.apache.zookeeper.ZooKeeper Thread-1: Session: 0x2504c1df9409094 closed > 2016-04-20 11:26:45,729 WARN org.apache.hadoop.ha.ActiveStandbyElector main-EventThread: Ignoring stale result from old client with sessionId 0x2504c1df9409094 > 2016-04-20 11:26:45,729 INFO org.apache.zookeeper.ClientCnxn main-EventThread: EventThread shut down > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)