hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4882) Change the log level to DEBUG for recovering completed applications
Date Wed, 30 Mar 2016 17:13:25 GMT

    [ https://issues.apache.org/jira/browse/YARN-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218340#comment-15218340
] 

Jason Lowe commented on YARN-4882:
----------------------------------

The problem with saying we can up the log level to DEBUG is that debugging recovery will often
be a postmortem process, trying to figure out why a particular application wasn't recovered
or didn't end up in the right state.  That also means logging a single rollup value for all
apps is going to not be very helpful when the problem is trying to figure out what happened
to a particular app.  By the time one wants to re-run the scenario with DEBUG turned on the
state store is going to be significantly different.

I agree the amount of logging is a bit excessive today and can be reduced, but I also think
it's important to track how active applications were recovered.  I can think of a couple of
potential options:

1) Reduce the logging for completed applications, potentially just one line per app or just
a rollup number of completed apps. Completed applications dominate the current recovery logging
and are less interesting.   For debugging, one can presume that an app that wasn't logged
during recovery either was missing from the state store or was recovered as completed.

2) Use a separate logger for recovery activity which allows users to control the amount of
logging pertaining to recovery (i.e.: via log level) and potentially use a separate log file
just for recovery

And of course we could do some combination of the above options, both reducing the recovery
logging we have today and allowing the user to control it separately from normal RM logging.

> Change the log level to DEBUG for recovering completed applications
> -------------------------------------------------------------------
>
>                 Key: YARN-4882
>                 URL: https://issues.apache.org/jira/browse/YARN-4882
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Rohith Sharma K S
>            Assignee: Daniel Templeton
>
> I think for recovering completed applications no need to log as INFO, rather it can be
made it as DEBUG.  The problem seen from large cluster is if any issue happens during RM start
up and continuously switching , then  RM logs are filled with most with recovering applications
only. 
> There are 6 lines are logged for 1 applications as I shown in below logs, then consider
RM default value for max-completed applications is 10K. So for each switch 10K*6=60K lines
will be added which is not useful I feel.
> {noformat}
> 2016-03-01 10:20:59,077 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager:
Default priority level is set to application:application_1456298208485_21507
> 2016-03-01 10:20:59,094 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
Recovering app: application_1456298208485_21507 with 1 attempts and final state = FINISHED
> 2016-03-01 10:20:59,100 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
Recovering attempt: appattempt_1456298208485_21507_000001 with final state: FINISHED
> 2016-03-01 10:20:59,107 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1456298208485_21507_000001 State change from NEW to FINISHED
> 2016-03-01 10:20:59,111 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1456298208485_21507 State change from NEW to FINISHED
> 2016-03-01 10:20:59,112 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger:
USER=rohith   OPERATION=Application Finished - Succeeded      TARGET=RMAppManager     RESULT=SUCCESS
 APPID=application_1456298208485_21507
> {noformat}
> The main problem is missing important information's from the logs before RM unstable.
Even though log roll back is 50 or 100, in a short period all these logs will be rolled out
and all the logs contains only RM switching information that too recovering applications!!.

> I suggest at least completed applications recovery should be logged as DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message