hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Eagles (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-2416) InvalidStateTransitonException in ResourceManager if AMLauncher does not receive response for startContainers() call in time
Date Tue, 22 Aug 2017 16:52:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137029#comment-16137029
] 

Jonathan Eagles edited comment on YARN-2416 at 8/22/17 4:51 PM:
----------------------------------------------------------------

YARN-7044 is the cause of the test failure. [~jlowe], can you review the 003 version of this
patch. I have added ignored transitions for the states that were missing the LAUNCHED event.


was (Author: jeagles):
YARN-7044 is the cause of the test failure. [~jlowe], can you review the 002 version of this
patch. I have added ignored transitions for the states that were missing the LAUNCHED event.

> InvalidStateTransitonException in ResourceManager if AMLauncher does not receive response
for startContainers() call in time
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-2416
>                 URL: https://issues.apache.org/jira/browse/YARN-2416
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Jian Fang
>            Assignee: Jonathan Eagles
>            Priority: Critical
>         Attachments: YARN-2416.001.patch, YARN-2416.002.patch, YARN-2416.003.patch
>
>
> AMLauncher calls startContainers(allRequests) to launch a container for application master.
Normally, the call comes back immediately so that the RMAppAttempt changes its state from
ALLOCATED to LAUNCHED. 
> However, we do observed that in some cases, the RPC call came back very late but the
AM container was already started. Because the RMAppAttempt stuck in ALLOCATED state, once
resource manager received the REGISTERED event from the application master, it threw InvalidStateTransitonException
as follows.
> 2014-07-05 08:59:05,021 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: REGISTERED
at ALLOCATED
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:652)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:752)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>         at java.lang.Thread.run(Thread.java:744)
> For subsequent STATUS_UPDATE and CONTAINER_ALLOCATED events for this job, resource manager
kept throwing InvalidStateTransitonException.
> 2014-07-05 08:59:06,152 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE
at ALLOCATED
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:652)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:752)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>         at java.lang.Thread.run(Thread.java:744)
> 2014-07-05 08:59:07,779 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1404549222428_0001_02_000002 Container Transitioned from NEW to
>  ALLOCATED
> 2014-07-05 08:59:07,779 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_ALLOCATED
at ALLOCATED
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:652)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:752)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>         at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message