Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-issues@hadoop.apache.org
Date: Wed, 11 May 2011 16:37:47 +0000 (UTC)
From: "Aaron Baff (JIRA)" <jira@apache.org>
To: mapreduce-issues@hadoop.apache.org
Message-ID: 
 <576106761.3561.1305131867702.JavaMail.tomcat@hel.zones.apache.org>
In-Reply-To: 
 <1194125159.19919.1304460544410.JavaMail.tomcat@hel.zones.apache.org>
Subject: [jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on
 RunningJob.getCounters() call
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031835#comment-13031835 ] 

Aaron Baff commented on MAPREDUCE-2470:
---------------------------------------

Yes, it's quite reproducible, at least for me. If a MR Job has been retired for around mapreduce.jobtracker.persist.jobstatus.hours hours (default 1 hour), then it is removed by the JobTracker from mapreduce.jobtracker.persist.jobstatus.dir (default /jobtracker/jobsInfo on HDFS). Once it's removed, the Counters are no longer available when fetching information about the Job. Going one step further, the JobTracker caches the last 1000 (default) MR Job info's in memory, yet even if it's still available in the cache and can be viewed through the Web UI, when you try and fetch the Counters programmatically but has been removed from HDFS by the JT, then you still get NULL returned from the JT, instead of an empty set of Counters. 

So, the real question for me, is this bad behavior by the JobTracker? Or should the Hadoop client library be the one to handle the NULL and either pass NULL along to user-code, or create an empty set of Counters? Whichever it is, it'd be very nice to have it documented in the Javadocs, including the old API as well as the new API.

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>         Attachments: counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira