From core-dev-return-51509-apmail-hadoop-core-dev-archive=hadoop.apache.org@hadoop.apache.org Thu Oct 02 18:08:36 2008 Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 86607 invoked from network); 2 Oct 2008 18:08:36 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 Oct 2008 18:08:36 -0000 Received: (qmail 53544 invoked by uid 500); 2 Oct 2008 18:08:33 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 53505 invoked by uid 500); 2 Oct 2008 18:08:33 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 53494 invoked by uid 99); 2 Oct 2008 18:08:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Oct 2008 11:08:33 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Oct 2008 18:07:39 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 4F0E2234C214 for ; Thu, 2 Oct 2008 11:07:44 -0700 (PDT) Message-ID: <508670032.1222970864322.JavaMail.jira@brutus> Date: Thu, 2 Oct 2008 11:07:44 -0700 (PDT) From: "dhruba borthakur (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-4296) Spasm of JobClient failures on successful jobs every once in a while In-Reply-To: <1228560926.1222535626082.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636413#action_12636413 ] dhruba borthakur commented on HADOOP-4296: ------------------------------------------ >However, since we already do dfs scans per job inline in the RPC handler (and AFAIK there is no noticeable impact) can you pl explain about this already existing dfs scan that that JT does? When does it do it? > Spasm of JobClient failures on successful jobs every once in a while > -------------------------------------------------------------------- > > Key: HADOOP-4296 > URL: https://issues.apache.org/jira/browse/HADOOP-4296 > Project: Hadoop Core > Issue Type: Bug > Components: mapred > Affects Versions: 0.17.1 > Reporter: Joydeep Sen Sarma > Assignee: dhruba borthakur > Priority: Critical > Attachments: 4296_jt_delayretire.patch > > > At very busy times - we get a wave of job client failures all at the same time. the failures come when the job is about to complete. when we look at the job history files - the jobs are actually complete. Here's the stack: > 08/09/27 02:18:00 INFO mapred.JobClient: map 100% reduce 98% > 08/09/27 02:18:41 INFO mapred.JobClient: map 100% reduce 99% > java.lang.NullPointerException > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:993) > at com.facebook.hive.common.columnSetLoader.main(columnSetLoader.java:535) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:155) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.