hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Baff (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call
Date Wed, 11 May 2011 16:37:47 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031835#comment-13031835

Aaron Baff commented on MAPREDUCE-2470:

Yes, it's quite reproducible, at least for me. If a MR Job has been retired for around mapreduce.jobtracker.persist.jobstatus.hours
hours (default 1 hour), then it is removed by the JobTracker from mapreduce.jobtracker.persist.jobstatus.dir
(default /jobtracker/jobsInfo on HDFS). Once it's removed, the Counters are no longer available
when fetching information about the Job. Going one step further, the JobTracker caches the
last 1000 (default) MR Job info's in memory, yet even if it's still available in the cache
and can be viewed through the Web UI, when you try and fetch the Counters programmatically
but has been removed from HDFS by the JT, then you still get NULL returned from the JT, instead
of an empty set of Counters. 

So, the real question for me, is this bad behavior by the JobTracker? Or should the Hadoop
client library be the one to handle the NULL and either pass NULL along to user-code, or create
an empty set of Counters? Whichever it is, it'd be very nice to have it documented in the
Javadocs, including the old API as well as the new API.

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>         Attachments: counters_null_data.pcap
> This is running in a Java daemon that is used as an interface (Thrift) to get information
and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object
(I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters().
This seems to occur after the daemon has been up and running for a while, and in the event
of an Exception, I close the JobClient, set it to NULL, and a new one should then be created
on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below
is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message