hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jiandan Yang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-9237) RM prints a lot of "Cannot get RMApp by appId" log when RM failover
Date Fri, 25 Jan 2019 10:34:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-9237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16752119#comment-16752119

Jiandan Yang  edited comment on YARN-9237 at 1/25/19 10:33 AM:

Thanks [~cheersyang]  for your quick response.
{quote} change to ApplicationState.FINISHED != appEntry.getValue().getApplicationState(){quote}
I agree with you, this style is better
After looking through code several times,  I'm not sure how to test it, maybe existing ut
is ok.
Do you have good test idea.

was (Author: yangjiandan):
Thanks [~cheersyang]  for your quick response. I'll update patch according to your comment.

> RM prints a lot of "Cannot get RMApp by appId" log when RM failover
> -------------------------------------------------------------------
>                 Key: YARN-9237
>                 URL: https://issues.apache.org/jira/browse/YARN-9237
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>            Reporter: Jiandan Yang 
>            Assignee: Jiandan Yang 
>            Priority: Major
>         Attachments: YARN-9237.001.patch, YARN-9237.002.patch
> I found a lot of following log in active RM log file after doing  failover RM
> {code:java}
> 2019-01-24 15:43:58,999 WARN org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl:
Cannot get RMApp by appId=application_1542178952162_34746156, just added it to finishedApplications
list for cleanup
> .....
> {code}
> I looked forward RM logs and find this app had finished before hours
> {code:java}
> 2019-01-23 21:49:55,683 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1542178952162_34746156_000001 State change from FINAL_SAVING to FINISHING
> {code}
> The reason of RM prints " Cannot get RMApp by appId"  is as follows:
> 1. RM failover
> 2. NM reports all running apps to RM in register request
> 3. The running apps are from NMContext, some apps may already finished
> 4. In my cluster, yarn.log-aggregation-enable=false, yarn.nodemanager.log.retain-seconds=86400(1day),
so app is kept in NMContext before app has finished for 24 hours
> 5. My Yarn cluster runs 50k apps per day and 7k nodes, and NM will report many finished
apps to RM.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message