hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7663) RMAppImpl:Invalid event: START at KILLED
Date Wed, 03 Jan 2018 22:47:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310417#comment-16310417
] 

Jason Lowe commented on YARN-7663:
----------------------------------

Thanks for updating the patch!

The unit test passes even without the fix, so it does not work well as a regression test.
 The problem is that the test needs some way to recognize an invalid transition occurred.
 Currently it simply logs a message which is not detectable by a unit test.  One way to work
around this is to have RMAppImpl call a new, empty protected method, e.g.: onInvalidStateTransition,
that the unit test can override to detect when invalid transitions occur.

Speaking of invalid transitions, there's another one happening in the unit test.  In the test's
output:
{noformat}
2018-01-03 15:41:13,044 ERROR [Thread-96] rmapp.RMAppImpl (RMAppImpl.java:handle(886)) - App:
application_1515015672238_0022 can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: APP_UPDATE_SAVED
at KILLED
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:884)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:119)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.sendAppUpdateSavedEvent(TestRMAppTransitions.java:492)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppStartAfterKilled(TestRMAppTransitions.java:1168)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{noformat}

When an app is killed in the NEW state it skips storing it to the state store and simply moves
it to the killed state.  That's why we get an invalid state transition when we try to send
a state store event -- the state store should not be involved anymore once we're in the KILLED
state.  Actually that's probably a bug -- I suspect apps that are killed from the NEW state
don't end up getting recovered on RM restart.  That could be fixed as part of this JIRA, but
it may be better to defer that to another JIRA.

It would be good to cleanup the whitespace nits identified by checkstyle as well.

> RMAppImpl:Invalid event: START at KILLED
> ----------------------------------------
>
>                 Key: YARN-7663
>                 URL: https://issues.apache.org/jira/browse/YARN-7663
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.8.0
>            Reporter: lujie
>            Assignee: lujie
>            Priority: Minor
>              Labels: patch
>         Attachments: YARN-7663_1.patch, YARN-7663_2.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: START at
KILLED
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will deterministically
reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message