hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-376) Apps that have completed can appear as RUNNING on the NM UI
Date Mon, 04 Feb 2013 19:44:12 GMT

    [ https://issues.apache.org/jira/browse/YARN-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570568#comment-13570568
] 

Jason Lowe commented on YARN-376:
---------------------------------

There appears to be a race condition in the RM's handling of finished applications that may
explain this.  ResourceTrackerService is sending the list of finished applications to the
node when the node heartbeats and then subsequently sending a status update event to the RMNodeImpl
that corresponds to the node.  The RMNodeImpl clears the entire list of finished applications
once it has processed the status update.  If an application completes *after* the ResourceTrackerService
has asynchronously retrieved the list of finished applications but *before* the status update
event is posted to the RMNodeImpl then the application will be added to then cleared from
the list of finished applications before the ResourceTrackerService had a chance to notify
the node of the completing application.
                
> Apps that have completed can appear as RUNNING on the NM UI
> -----------------------------------------------------------
>
>                 Key: YARN-376
>                 URL: https://issues.apache.org/jira/browse/YARN-376
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.0.3-alpha, 0.23.6
>            Reporter: Jason Lowe
>
> On a busy cluster we've noticed a growing number of applications appear as RUNNING on
a nodemanager web pages but the applications have long since finished.  Looking at the NM
logs, it appears the RM never told the nodemanager that the application had finished.  This
is also reflected in a jstack of the NM process, since many more log aggregation threads are
running then one would expect from the number of actively running applications.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message