hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith Sharma K S (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7835) [Atsv2] Race condition in NM while publishing events if second attempt launched on same node
Date Mon, 05 Feb 2018 10:58:02 GMT

    [ https://issues.apache.org/jira/browse/YARN-7835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352238#comment-16352238

Rohith Sharma K S commented on YARN-7835:

bq. An alternative would be be to only clean up the collector when the application finishes
instead of when an AM container finishes
It is doable and should be fine! One concern from very very rare scenario is this will make
collector map to retain as long as application_stop event triggers. Lets take example where
1st attempt is running in Node-1 and killed. 2nd attempt started on different node, but Node-1
doesn't get application_stop event since application is still running which causes Node-1
to keep this map. Once application is finished, this will be removed but if it is long running
application, then this map will retain in two nodemanagers. It would be become a gradual leak
in case of long running applications. 

> [Atsv2] Race condition in NM while publishing events if second attempt launched on same
> --------------------------------------------------------------------------------------------
>                 Key: YARN-7835
>                 URL: https://issues.apache.org/jira/browse/YARN-7835
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>         Attachments: YARN-7835.001.patch
> It is observed race condition that if master container is killed for some reason and
launched on same node then NMTimelinePublisher doesn't add timelineClient. But once completed
container for 1st attempt has come then NMTimelinePublisher removes the timelineClient. 
>  It causes all subsequent event publishing from different client fails to publish with
exception Application is not found. !

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message