hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases
Date Wed, 11 May 2016 20:09:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280731#comment-15280731

Jason Lowe commented on YARN-4325:

Thanks, Junping!  The test failure is related.  In addition to the javac warning that should
be cleaned up, it looks like there's an unlikely code path in NonAggregatingLogHandler where
if we fail to lookup the appId then it doesn't respond to the APPLICATION_FINISHED event.

> Purge app state from NM state-store should cover more LOG_HANDLING cases
> ------------------------------------------------------------------------
>                 Key: YARN-4325
>                 URL: https://issues.apache.org/jira/browse/YARN-4325
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: ApplicationImpl.PNG, YARN-4325-v1.1.patch, YARN-4325-v1.patch, YARN-4325-v2.patch,
> From a long running cluster, we found tens of thousands of stale apps still be recovered
in NM restart recovery. 
> After investigating, there are three issues cause app state leak in NM state-store:
> 1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in NMStateStore.
> 2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit aggregator's doAppLogAggregation()
exception case.
> 3. Only Application in FINISHED status receiving APPLICATION_LOG_FINISHED has transition
to remove app in NM state store. Application in other status - like APPLICATION_RESOURCES_CLEANUP
will ignore the event and later forget to remove this app from NM state store even after app
get finished.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message