hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuan Gong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
Date Thu, 12 Sep 2013 18:14:43 GMT

    [ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765722#comment-13765722
] 

Xuan Gong commented on YARN-1149:
---------------------------------

bq.It creates additional overhead, as the events will be sent every 1s. As the new requests
will be blocked, what we need to prevent is that while stopService() is running, startContainers()
shouldn't be on the half way, where the application is created but not added into the context.
Then, if unfortunately CMgrCompletedAppsEvent is sent at this time, the newly created application
will not be notified. Maybe we can use synchronization to prevent stopService() and startContainers()
from being interpolated

Yes, It does create some additional overhead. Maybe We can make the waiting time a little
bit longer. For using synchronization to prevent stopService() and startContainers() from
being interpolated, I am thinking that we may have the race condition problem. I think that
containermanager should start to serviceStop() immediately without waiting. Imaging that if
we have several startContainers() calls, and one serviceStop() call, we can not guarantee
when the serviceStop() will be executed. It is very possible that it need to wait for all
the startContainers() requests been finished.
                
> NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED
at RUNNING
> -----------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1149
>                 URL: https://issues.apache.org/jira/browse/YARN-1149
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Ramya Sunil
>            Assignee: Xuan Gong
>             Fix For: 2.1.1-beta
>
>         Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, YARN-1149.4.patch
>
>
> When nodemanager receives a kill signal when an application has finished execution but
log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED
at RUNNING is thrown
> {noformat}
> 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254))
- Application just finished : application_1377459190746_0118
> 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105))
- Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/<host>_45454.tmp
> 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151))
- Waiting for aggregation to complete for application_1377459190746_0118
> 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122))
- Uploading logs for container container_1377459190746_0118_01_000004. Current good log dirs
are /tmp/yarn/local
> 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182))
- Finished aggregate log-file for app application_1377459190746_0118
> 2013-08-25 20:45:00,925 WARN  application.Application (ApplicationImpl.java:handle(427))
- Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED
at RUNNING
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)

>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
  
>         at java.lang.Thread.run(Thread.java:662)
> 2013-08-25 20:45:00,926 INFO  application.Application (ApplicationImpl.java:handle(430))
- Application application_1377459190746_0118 transitioned from RUNNING to null
> 2013-08-25 20:45:00,927 WARN  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(463))
- org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
is interrupted. Exiting.
> 2013-08-25 20:45:00,938 INFO  ipc.Server (Server.java:stop(2437)) - Stopping server on
8040
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message