hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Kanter (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4946) RM should not consider an application as COMPLETED when log aggregation is not in a terminal state
Date Thu, 02 Aug 2018 20:07:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567434#comment-16567434
] 

Robert Kanter commented on YARN-4946:
-------------------------------------

I'm not sure if any of the 3 versions of the ATS have the log aggregation status info.  But
I agree that we shouldn't add this dependency if possible.  I also think it makes sense for
the RM to remember Applications if they're still doing something, including the log aggregation.

Thanks for the patch [~snemeth], a couple things:
# I'm not sure creating so many helper methods is necessary, especially the ones that are
one or two lines of code like {{recordLogAggregationStartTime}}.  
# The current approach is changing when an App is considered finished ({{APP_COMPLETED}})
and delaying it until the log aggregation has finished.  That could take minutes after the
App actually finishes, so this is going to add a considerable delay on a bunch of other things
- definitely something users will notice.  I think we should try to limit the scope of the
changes so that we leave the App lifecycle as-is, but only change the part where we decide
to evict an App from the RM.  
#- More specifically, if you look at {{RMAppManager#checkAppNumCompletedLimit}}, you can see
that it's comparing a counter for the number of completed apps vs the configured max.  We
can simply adjust the logic here or the counter to only count an App once it's both completed
_and_ log aggregation has completed.

> RM should not consider an application as COMPLETED when log aggregation is not in a terminal
state
> --------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4946
>                 URL: https://issues.apache.org/jira/browse/YARN-4946
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: log-aggregation
>    Affects Versions: 2.8.0
>            Reporter: Robert Kanter
>            Assignee: Szilard Nemeth
>            Priority: Major
>         Attachments: YARN-4946.001.patch, YARN-4946.002.patch
>
>
> MAPREDUCE-6415 added a tool that combines the aggregated log files for each Yarn App
into a HAR file.  When run, it seeds the list by looking at the aggregated logs directory,
and then filters out ineligible apps.  One of the criteria involves checking with the RM that
an Application's log aggregation status is not still running and has not failed.  When the
RM "forgets" about an older completed Application (e.g. RM failover, enough time has passed,
etc), the tool won't find the Application in the RM and will just assume that its log aggregation
succeeded, even if it actually failed or is still running.
> We can solve this problem by doing the following:
> The RM should not consider an app to be fully completed (and thus removed from its history)
until the aggregation status has reached a terminal state (e.g. SUCCEEDED, FAILED, TIME_OUT).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message