Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BCB6347EA for ; Wed, 11 May 2011 16:38:27 +0000 (UTC) Received: (qmail 26944 invoked by uid 500); 11 May 2011 16:38:27 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 26916 invoked by uid 500); 11 May 2011 16:38:27 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 26908 invoked by uid 99); 11 May 2011 16:38:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 May 2011 16:38:27 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 May 2011 16:38:26 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id AC64E455A5 for ; Wed, 11 May 2011 16:37:47 +0000 (UTC) Date: Wed, 11 May 2011 16:37:47 +0000 (UTC) From: "Aaron Baff (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <576106761.3561.1305131867702.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1194125159.19919.1304460544410.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031835#comment-13031835 ] Aaron Baff commented on MAPREDUCE-2470: --------------------------------------- Yes, it's quite reproducible, at least for me. If a MR Job has been retired for around mapreduce.jobtracker.persist.jobstatus.hours hours (default 1 hour), then it is removed by the JobTracker from mapreduce.jobtracker.persist.jobstatus.dir (default /jobtracker/jobsInfo on HDFS). Once it's removed, the Counters are no longer available when fetching information about the Job. Going one step further, the JobTracker caches the last 1000 (default) MR Job info's in memory, yet even if it's still available in the cache and can be viewed through the Web UI, when you try and fetch the Counters programmatically but has been removed from HDFS by the JT, then you still get NULL returned from the JT, instead of an empty set of Counters. So, the real question for me, is this bad behavior by the JobTracker? Or should the Hadoop client library be the one to handle the NULL and either pass NULL along to user-code, or create an empty set of Counters? Whichever it is, it'd be very nice to have it documented in the Javadocs, including the old API as well as the new API. > Receiving NPE occasionally on RunningJob.getCounters() call > ----------------------------------------------------------- > > Key: MAPREDUCE-2470 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client > Affects Versions: 0.21.0 > Environment: FreeBSD, Java6, Hadoop r0.21.0 > Reporter: Aaron Baff > Attachments: counters_null_data.pcap > > > This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace. > java.lang.NullPointerException > at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77) > at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381) > at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350) > at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545) > at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421) > at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697) > at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira