hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Bacsko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-7046) Enhance logging related to retrieving Job
Date Mon, 05 Feb 2018 11:29:00 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-7046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352272#comment-16352272
] 

Peter Bacsko commented on MAPREDUCE-7046:
-----------------------------------------

[~yufeigu] thanks for the comments.

bq. "method getJob in class Cluster, the warn message could be "Failed to load job configuration!",
the second info message is not necessary."
I'll change this.

bq. "need log for method getJob in class JobClient"
What log message would you add and where?

bq. "The new log message in class MRClientService is not necessary since an IOException will
be threw out if it is null, the same to class HistoryClientService"
It's only thrown if {{exceptionThrown == true}}. But we call the method with {{false}}.

bq. "Can you explain a little about the first two debug message in method getProxy? The third
debug message could be a warn message."
I'm not really sure if that log messages are really necessary or not. I just though that they
could be useful, but we can drop them.

I think it's OK on DEBUG level, because if the application ID is not found, it means that
the MR job has completed - in this case, the caller will be redirected to the JHS.

> Enhance logging related to retrieving Job
> -----------------------------------------
>
>                 Key: MAPREDUCE-7046
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7046
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Major
>         Attachments: MAPREDUCE-7046-001.patch
>
>
> We recently encountered an interesting problem. In one case, Hive Driver was unable to
retrieve the status of a MapReduce job. The following stack trace was printed:
> {noformat}
> [main] INFO  org.apache.hadoop.hive.ql.exec.Task  - 2018-01-15 00:18:09,324 Stage-2 map
= 0%,  reduce = 0%, Cumulative CPU 1679.31 sec
>  [main] ERROR org.apache.hadoop.hive.ql.exec.Task  - Ended Job = job_1511036412170_1322169
with exception 'java.io.IOException(Could not find status of job:job_1511036412170_1322169)'
> java.io.IOException: Could not find status of job:job_1511036412170_1322169
> 	at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:295)
> 	at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549)
> 	at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:435)
> 	at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
> 	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
> 	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> 	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1782)
> 	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1539)
> 	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1318)
> 	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1127)
> 	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1115)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:220)
> 	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:172)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:383)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:318)
> 	at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:416)
> 	at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:432)
> 	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:726)
> 	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:693)
> 	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:628)
> 	at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:325)
> 	at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:302)
> 	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:49)
> {noformat}
> We examined the logs from JHS and AM, but haven't seen anything suspicious. For some
reason a {{null}} was returned but it's not obvious why. The MR job was running at this point.
> Some ideas:
> 1. We already have logging in place related to JobClient->AM and JobClient->JHS
communication, but that's on TRACE level and that could be too low. It might make more sense
to raise the level to DEBUG.
> 2. We need new {{LOG.debug()}} calls at some crucial points



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message