hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3742) YARN RM will shut down if ZKClient creation times out
Date Fri, 21 Apr 2017 23:49:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979612#comment-15979612
] 

Karthik Kambatla commented on YARN-3742:
----------------------------------------

That looks much cleaner. Thanks for updating the patch. 

Couple of nits:
# When creating RMFatalEvent, start the message with capital letters in AdminService, ActiveStandbyElectorBasedElectorService
and RMCriticalThreadUncaughtExceptionHandler
# RMCriticalThread...: Would "Critical thread, %s, exited unexpectedly" be a grammatically
better message?


> YARN RM  will shut down if ZKClient creation times out 
> -------------------------------------------------------
>
>                 Key: YARN-3742
>                 URL: https://issues.apache.org/jira/browse/YARN-3742
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.0
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Daniel Templeton
>         Attachments: YARN-3742.001.patch, YARN-3742.002.patch
>
>
> The RM goes down showing the following stacktrace if the ZK client connection fails to
be created. We should not exit but transition to StandBy and stop doing things and let the
other RM take over.
> {code}
> 2015-04-19 01:22:20,513  FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED.
Cause:
> java.io.IOException: Wait for ZKClient creation timed out
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1066)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1090)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:996)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationStateInternal(ZKRMStateStore.java:643)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppTransition.transition(RMStateStore.java:162)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppTransition.transition(RMStateStore.java:147)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message