hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5006) ResourceManager quit due to ApplicationStateData exceed the limit size of znode in zk
Date Wed, 14 Jun 2017 11:46:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049089#comment-16049089
] 

Hadoop QA commented on YARN-5006:
---------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 13s{color} | {color:blue}
Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m  0s{color}
| {color:green} The patch appears to include 1 new or modified test files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  4s{color} | {color:blue}
Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 16s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 56s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 56s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 51s{color} |
{color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  5s{color} | {color:red}
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in trunk has 1 extant Findbugs warnings.
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 27s{color} |
{color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 10s{color} | {color:blue}
Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 25s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 10s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 10s{color} | {color:green}
the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  0m 53s{color}
| {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 3 new + 391 unchanged
- 2 fixed = 394 total (was 393) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 47s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  1s{color} | {color:green}
The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 35s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 24s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 33s{color} | {color:green}
hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 24s{color} | {color:green}
hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 39m 13s{color} | {color:red}
hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 33s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 97m  4s{color} | {color:black}
{color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-5006 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12872953/YARN-5006.003.patch
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  unit  findbugs
 checkstyle  xml  |
| uname | Linux 57811cff345b 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6ed54f3 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/16184/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common-warnings.html
|
| checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/16184/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
|
| unit | https://builds.apache.org/job/PreCommit-YARN-Build/16184/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
|
|  Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16184/testReport/ |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn
|
| Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16184/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> ResourceManager quit due to ApplicationStateData exceed the limit  size of znode in zk
> --------------------------------------------------------------------------------------
>
>                 Key: YARN-5006
>                 URL: https://issues.apache.org/jira/browse/YARN-5006
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0, 2.7.2
>            Reporter: dongtingting
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>         Attachments: YARN-5006.001.patch, YARN-5006.002.patch, YARN-5006.003.patch
>
>
> Client submit a job, this job add 10000 file into DistributedCache. when the job is submitted,
ResourceManager sotre ApplicationStateData into zk. ApplicationStateData  is exceed the limit
size of znode. RM exit 1.   
> The related code in RMStateStore.java :
> {code}
>   private static class StoreAppTransition
>       implements SingleArcTransition<RMStateStore, RMStateStoreEvent> {
>     @Override
>     public void transition(RMStateStore store, RMStateStoreEvent event) {
>       if (!(event instanceof RMStateStoreAppEvent)) {
>         // should never happen
>         LOG.error("Illegal event type: " + event.getClass());
>         return;
>       }
>       ApplicationState appState = ((RMStateStoreAppEvent) event).getAppState();
>       ApplicationId appId = appState.getAppId();
>       ApplicationStateData appStateData = ApplicationStateData
>           .newInstance(appState);
>       LOG.info("Storing info for app: " + appId);
>       try {  
>         store.storeApplicationStateInternal(appId, appStateData);  //store the appStateData
>         store.notifyApplication(new RMAppEvent(appId,
>                RMAppEventType.APP_NEW_SAVED));
>       } catch (Exception e) {
>         LOG.error("Error storing app: " + appId, e);
>         store.notifyStoreOperationFailed(e);   //handle fail event, system exit 
>       }
>     };
>   }
> {code}
> The Exception log:
> {code}
>  ...
> 2016-04-20 11:26:35,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
AsyncDispatcher event handler: Maxed out ZK retries. Giving up!
> 2016-04-20 11:26:35,732 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore
AsyncDispatcher event handler: Error storing app: application_1461061795989_17671
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>         at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931)
>         at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:936)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:933)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1075)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1096)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:933)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:947)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:956)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:626)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:138)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:123)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:860)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:855)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>         at java.lang.Thread.run(Thread.java:724)
>    ...
> 2016-04-20 11:26:45,613 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
AsyncDispatcher event handler: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent
of type STATE_STORE_OP_FAILED. Cause:
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>         at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931)
>         at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:936)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:933)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1075)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1096)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:933)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:947)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:956)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:626)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:138)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:123)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore
> .java:860)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:855)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>         at java.lang.Thread.run(Thread.java:724)
> 2016-04-20 11:26:45,615 INFO org.apache.hadoop.util.ExitUtil AsyncDispatcher event handler:
Exiting with status 1
> 2016-04-20 11:26:45,622 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager
Thread[Thread-17,5,main]: ExpiredTokenRemover received java.lang.InterruptedException: sleep
interrupted
> 2016-04-20 11:26:45,623 INFO org.mortbay.log Thread-1: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@10.0.0.1:9088
> 2016-04-20 11:26:45,623 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager
Thread[Thread-21,5,main]: ExpiredTokenRemover received java.lang.InterruptedException: sleep
interrupted
> 2016-04-20 11:26:45,624 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager
Thread[Thread-19,5,main]: ExpiredTokenRemover received java.lang.InterruptedException: sleep
interrupted
> 2016-04-20 11:26:45,724 INFO org.apache.hadoop.ipc.Server Thread-1: Stopping server on
9033
> 2016-04-20 11:26:45,725 INFO org.apache.hadoop.ipc.Server IPC Server listener on 9033:
Stopping IPC Server listener on 9033
> 2016-04-20 11:26:45,725 INFO org.apache.hadoop.ha.ActiveStandbyElector Thread-1: Yielding
from election
> 2016-04-20 11:26:45,725 INFO org.apache.hadoop.ipc.Server IPC Server Responder: Stopping
IPC Server Responder
> 2016-04-20 11:26:45,725 INFO org.apache.hadoop.ha.ActiveStandbyElector Thread-1: Deleting
bread-crumb of active node...
> 2016-04-20 11:26:45,729 INFO org.apache.zookeeper.ZooKeeper Thread-1: Session: 0x2504c1df9409094
closed
> 2016-04-20 11:26:45,729 WARN org.apache.hadoop.ha.ActiveStandbyElector main-EventThread:
Ignoring stale result from old client with sessionId 0x2504c1df9409094
> 2016-04-20 11:26:45,729 INFO org.apache.zookeeper.ClientCnxn main-EventThread: EventThread
shut down
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message