hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-214) RMContainerImpl does not handle event EXPIRE at state RUNNING
Date Fri, 16 Nov 2012 17:24:12 GMT

    [ https://issues.apache.org/jira/browse/YARN-214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498941#comment-13498941

Robert Joseph Evans commented on YARN-214:

Now that I understand the code better I think that ignoring the EXPIRE at the RUNNING state
makes since.  The EXPIRE event only happens when a container has been waiting in allocated
for more then 10 min (default config).  This really would only happen when an App has gotten
a container and forgotten about it, or when the RM is running very slow and not processed
the transition events by the time the EXPIRE event is sent.

We register for the Expire event in the AquiredTransition going to the AQUIRED State, so we
need to handle the EXPIRE event at all states that are reachable from the AQUIRED state, and
have not already processed the Expire event.  This means we need to handle this in the KILLED,
RUNNING, COMPLETED, and RELEASED.  We need to add this to KILLED and RELEASED too.
> RMContainerImpl does not handle event EXPIRE at state RUNNING
> -------------------------------------------------------------
>                 Key: YARN-214
>                 URL: https://issues.apache.org/jira/browse/YARN-214
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 0.23.3, 2.0.1-alpha
>            Reporter: Jason Lowe
>            Assignee: Jonathan Eagles
>         Attachments: YARN-214.patch, YARN-214.patch, YARN-214.patch, YARN-214.patch
> RMContainerImpl has a race condition where a container can enter the RUNNING state just
as the container expires.  This results in an invalid event transition error:
> {noformat}
> 2012-11-11 05:31:38,954 [ResourceManager Event Processor] ERROR org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: EXPIRE at
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:205)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:44)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApp.containerCompleted(SchedulerApp.java:203)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1337)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:739)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:659)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:80)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:340)
>         at java.lang.Thread.run(Thread.java:619)
> {noformat}
> EXPIRE needs to be handled (well at least ignored) in the RUNNING state to account for
this race condition.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message