hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
Date Tue, 10 Sep 2013 23:11:52 GMT

    [ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13763670#comment-13763670
] 

Zhijie Shen commented on YARN-1149:
-----------------------------------

Conducted some investigation on the problem:

1. The following transition seems to be unnecessary, because APPLICATION_LOG_HANDLING_FINISHED
can be emitted as early as after APPLICATION_STARTED is handled, when Application is already
at INITING.
{code}
+          .addTransition(ApplicationState.NEW, ApplicationState.FINISHED,
+              ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED,
+              new AppShutDownTransition())
{code}

2. The following message seems not to cover all the cases:
{code}
+      LOG.info("Application " + app.getAppId() +
+          " is shutted down since NodeManager has been killed.");
{code}
In the normal case, APPLICATION_LOG_HANDLING_FINISHED is emitted after APPLICATION_FINISHED
is handled, when Application is already at FINISHED. The two exceptions are: 1. NM is stopping,
the running log aggregation job is signaled to stop early. In this case, this log info makes
sense. 2. The running log aggregation job is interrupted. See the following code:
{code}
    while (!this.appFinishing.get()) {
      synchronized(this) {
        try {
          wait(THREAD_SLEEP_TIME);
        } catch (InterruptedException e) {
          LOG.warn("PendingContainers queue is interrupted");
          this.appFinishing.set(true);
        }
      }
    }
{code}
In this case, the message seems not to be correct.

3. Should we do the following in AppShutDownTransition as well? This is because APPLICATION_LOG_HANDLING_FINISHED
is consumed, there'll not be the transition from FINISHED->FINISHED on APPLICATION_LOG_HANDLING_FINISHED,
and then the app will always be in the context.
{code}
      app.context.getApplications().remove(appId);
      app.aclsManager.removeApplication(appId);
{code}
                
> NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED
at RUNNING
> -----------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1149
>                 URL: https://issues.apache.org/jira/browse/YARN-1149
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Ramya Sunil
>            Assignee: Xuan Gong
>             Fix For: 2.1.1-beta
>
>         Attachments: YARN-1149.1.patch
>
>
> When nodemanager receives a kill signal when an application has finished execution but
log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED
at RUNNING is thrown
> {noformat}
> 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254))
- Application just finished : application_1377459190746_0118
> 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105))
- Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/<host>_45454.tmp
> 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151))
- Waiting for aggregation to complete for application_1377459190746_0118
> 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122))
- Uploading logs for container container_1377459190746_0118_01_000004. Current good log dirs
are /tmp/yarn/local
> 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182))
- Finished aggregate log-file for app application_1377459190746_0118
> 2013-08-25 20:45:00,925 WARN  application.Application (ApplicationImpl.java:handle(427))
- Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED
at RUNNING
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)

>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
  
>         at java.lang.Thread.run(Thread.java:662)
> 2013-08-25 20:45:00,926 INFO  application.Application (ApplicationImpl.java:handle(430))
- Application application_1377459190746_0118 transitioned from RUNNING to null
> 2013-08-25 20:45:00,927 WARN  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(463))
- org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
is interrupted. Exiting.
> 2013-08-25 20:45:00,938 INFO  ipc.Server (Server.java:stop(2437)) - Stopping server on
8040
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message