hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuan Gong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
Date Thu, 12 Sep 2013 01:15:53 GMT

    [ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765048#comment-13765048
] 

Xuan Gong commented on YARN-1149:
---------------------------------

New patch addresses several other issues:
1. adding more transition to FINISHING_CONTAINERS_WAIT,APPLICATION_RESOURCES_CLEANINGUP and
FINISHED. 
2. When the applications start to shut down. It is very possible that there are another applications
added into the context. 
{code}
setBlockNewContainerRequests(true);
{code}
Set this at the beginning of ContainerManager::serviceStop() to block any new container Requests

{code}
          try {
            Thread.sleep(1000);
            this.handle(
                new CMgrCompletedAppsEvent(new ArrayList<ApplicationId>(
                    applications.keySet()),
                    CMgrCompletedAppsEvent.Reason.ON_SHUTDOWN));
          } catch (InterruptedException ex) {
            LOG.warn("Interrupted while sleeping on applications finish on shutdown",
              ex);
          }
{code}

Also do this at the ShutDown block and Resync block. For all old applications (which have
already in context and have already received the FINISH_APPLICATION event), they will ignore
the events since they are already in FINISHING_CONTAINERS_WAIT or APPLICATION_RESOURCES_CLEANINGUP.
For all the newly added applications, when they receives the
FINISH_APPLICATION event, they will start to shut down. 
                
> NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED
at RUNNING
> -----------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1149
>                 URL: https://issues.apache.org/jira/browse/YARN-1149
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Ramya Sunil
>            Assignee: Xuan Gong
>             Fix For: 2.1.1-beta
>
>         Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, YARN-1149.4.patch
>
>
> When nodemanager receives a kill signal when an application has finished execution but
log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED
at RUNNING is thrown
> {noformat}
> 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254))
- Application just finished : application_1377459190746_0118
> 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105))
- Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/<host>_45454.tmp
> 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151))
- Waiting for aggregation to complete for application_1377459190746_0118
> 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122))
- Uploading logs for container container_1377459190746_0118_01_000004. Current good log dirs
are /tmp/yarn/local
> 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182))
- Finished aggregate log-file for app application_1377459190746_0118
> 2013-08-25 20:45:00,925 WARN  application.Application (ApplicationImpl.java:handle(427))
- Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED
at RUNNING
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)

>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
  
>         at java.lang.Thread.run(Thread.java:662)
> 2013-08-25 20:45:00,926 INFO  application.Application (ApplicationImpl.java:handle(430))
- Application application_1377459190746_0118 transitioned from RUNNING to null
> 2013-08-25 20:45:00,927 WARN  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(463))
- org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
is interrupted. Exiting.
> 2013-08-25 20:45:00,938 INFO  ipc.Server (Server.java:stop(2437)) - Stopping server on
8040
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message