hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith Sharma K S (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-8130) Race condition when container events are published for KILLED applications
Date Tue, 08 May 2018 11:05:00 GMT

     [ https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rohith Sharma K S updated YARN-8130:
------------------------------------
    Attachment: YARN-8130.01.patch

> Race condition when container events are published for KILLED applications
> --------------------------------------------------------------------------
>
>                 Key: YARN-8130
>                 URL: https://issues.apache.org/jira/browse/YARN-8130
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: ATSv2
>            Reporter: Charan Hebri
>            Priority: Major
>         Attachments: YARN-8130.01.patch
>
>
> There seems to be a race condition happening when an application is KILLED and the corresponding
container event information is being published. For completed containers, a YARN_CONTAINER_FINISHED
event is generated but for some containers in a KILLED application this information is missing. Below
is a node manager log snippet,
> {code:java}
> 2018-04-09 08:44:54,474 INFO  shuffle.ExternalShuffleBlockResolver (ExternalShuffleBlockResolver.java:applicationRemoved(186))
- Application application_1523259757659_0003 removed, cleanupLocalDirs = false
> 2018-04-09 08:44:54,478 INFO  application.ApplicationImpl (ApplicationImpl.java:handle(632))
- Application application_1523259757659_0003 transitioned from APPLICATION_RESOURCES_CLEANINGUP
to FINISHED
> 2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher (NMTimelinePublisher.java:putEntity(298))
- Seems like client has been removed before the entity could be published for TimelineEntity[type='YARN_CONTAINER',
id='container_1523259757659_0003_01_000002']
> 2018-04-09 08:44:54,478 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(520))
- Application just finished : application_1523259757659_0003
> 2018-04-09 08:44:54,488 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doContainerLogAggregation(576))
- Uploading logs for container container_1523259757659_0003_01_000001. Current good log dirs
are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:54,492 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doContainerLogAggregation(576))
- Uploading logs for container container_1523259757659_0003_01_000002. Current good log dirs
are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:55,470 INFO  collector.TimelineCollectorManager (TimelineCollectorManager.java:remove(192))
- The collector service for application_1523259757659_0003 was removed
> 2018-04-09 08:44:55,472 INFO  containermanager.ContainerManagerImpl (ContainerManagerImpl.java:handle(1572))
- couldn't find application application_1523259757659_0003 while processing FINISH_APPS event.
The ResourceManager allocated resources for this application to the NodeManager but no active
containers were found to process{code}
> The container id specified in the log, *container_1523259757659_0003_01_000002* is
the one that has the finished event missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message