Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 99D37186D6 for ; Fri, 30 Oct 2015 15:49:57 +0000 (UTC) Received: (qmail 42500 invoked by uid 500); 30 Oct 2015 15:49:54 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 42466 invoked by uid 500); 30 Oct 2015 15:49:54 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 42444 invoked by uid 99); 30 Oct 2015 15:49:54 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Oct 2015 15:49:54 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 9DCE61A2B58 for ; Fri, 30 Oct 2015 15:49:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.879 X-Spam-Level: ** X-Spam-Status: No, score=2.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id RzIzgbaernUO for ; Fri, 30 Oct 2015 15:49:51 +0000 (UTC) Received: from mail-oi0-f52.google.com (mail-oi0-f52.google.com [209.85.218.52]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 6732C23854 for ; Fri, 30 Oct 2015 15:49:51 +0000 (UTC) Received: by oifu63 with SMTP id u63so62076288oif.2 for ; Fri, 30 Oct 2015 08:49:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=KulzDfIGFV+SdA6Q9DZfh8gQUjWdXxupuivz8Y6TNFE=; b=aHzNYgiZ0pAuG/Yx8IlVWSaQA1WlOOWaqMKfsjaPRTuwsaA3ErN+Akn7rMlfRcee00 aii++kR+au3Ow2l9JlDIBkqejhFltf89Nx9ofY3OYnpWbJOYfnFpqX+zKiIfxrPlXbPx iWIF5Gz6NEGVXLUsXDud9SGzsSVvowficpEcZ/kvGhFGoafnWQ6nQIbvAPJRDKz3bnCI LD6DoJ0m8zYMgHVyqHZHWZuUlA1Lz8IvXifykfg0tYIptwCy3IYS/zs/z2cPkZyb+Gi+ kcOWaeEfx4634r0API1dUAgH5L4rH8lUqEIltn2fNkVPu65khMPls4N5FMwKXr6GTn9a oI/A== X-Received: by 10.202.178.130 with SMTP id b124mr5972698oif.10.1446220190765; Fri, 30 Oct 2015 08:49:50 -0700 (PDT) MIME-Version: 1.0 Received: by 10.202.196.133 with HTTP; Fri, 30 Oct 2015 08:49:31 -0700 (PDT) In-Reply-To: References: From: Josh Wills Date: Fri, 30 Oct 2015 08:49:31 -0700 Message-ID: Subject: Re: NullPointerExceptions in handleMultiPaths CompletionHook To: dev Content-Type: multipart/alternative; boundary=001a113b6976bcda2c05235460bd --001a113b6976bcda2c05235460bd Content-Type: text/plain; charset=UTF-8 David! Welcome back! I haven't hit that one before; if you tweak handleMultiPaths to look like the below, does it fix the issue? J private synchronized void handleMultiPaths(MRJob job) throws IOException { try { if (job.getJobState() == MRJob.State.SUCCESS) { if (!multiPaths.isEmpty()) { for (Map.Entry entry : multiPaths.entrySet()) { entry.getValue().handleOutputs(job.getJob().getConfiguration(), workingPath, entry.getKey()); } } } } catch(Exception ie) { throw new IOException(ie); } } On Fri, Oct 30, 2015 at 8:21 AM, David Whiting wrote: > Hi everybody! I'm back and pushing Crunch in a new organisation > > I'm having some strange non-deterministic problems with the end of my > Crunch job executions in a new environment - I've got some possible ideas > as to why it's happening, but no good ideas for workarounds so I was hoping > somebody might be able to help me out. Basically, this is what it looks > like: > > 15/10/30 15:01:55 INFO jobcontrol.CrunchControlledJob: Running job > "crunching.CountEventsByType: SeqFile([{REDACTED}... ID=1 (1/1)" > 15/10/30 15:01:55 INFO jobcontrol.CrunchControlledJob: Job status available > at: {REDACTED}/proxy/application_1443106319465_13029/ > 15/10/30 15:05:02 INFO ipc.Client: Retrying connect to server: {REDACTED}. > Already tried 0 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 > MILLISECONDS) > 15/10/30 15:05:03 INFO ipc.Client: Retrying connect to server: {REDACTED}. > Already tried 1 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 > MILLISECONDS) > 15/10/30 15:05:04 INFO ipc.Client: Retrying connect to server: {REDACTED}. > Already tried 2 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 > MILLISECONDS) > 15/10/30 15:05:04 INFO mapred.ClientServiceDelegate: Application state is > completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history > server > 15/10/30 15:05:04 ERROR exec.MRExecutor: Pipeline failed due to exception > java.io.IOException: java.lang.NullPointerException > at > > org.apache.crunch.impl.mr.exec.CrunchJobHooks$CompletionHook.handleMultiPaths(CrunchJobHooks.java:99) > at > > org.apache.crunch.impl.mr.exec.CrunchJobHooks$CompletionHook.run(CrunchJobHooks.java:86) > at > > org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.checkRunningState(CrunchControlledJob.java:288) > at > > org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.checkState(CrunchControlledJob.java:299) > at > > org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.checkRunningJobs(CrunchJobControl.java:201) > at > > org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.pollJobStatusAndStartNewOnes(CrunchJobControl.java:321) > at > org.apache.crunch.impl.mr.exec.MRExecutor.monitorLoop(MRExecutor.java:131) > at > org.apache.crunch.impl.mr.exec.MRExecutor.access$000(MRExecutor.java:58) > at > org.apache.crunch.impl.mr.exec.MRExecutor$1.run(MRExecutor.java:90) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at org.apache.hadoop.mapreduce.Job$1.run(Job.java:325) > at org.apache.hadoop.mapreduce.Job$1.run(Job.java:322) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:322) > at org.apache.hadoop.mapreduce.Job.isSuccessful(Job.java:632) > at > > org.apache.crunch.impl.mr.exec.CrunchJobHooks$CompletionHook.handleMultiPaths(CrunchJobHooks.java:91) > ... 9 more > > The corresponding line in the Hadoop source is this: > > return cluster.getClient().getJobStatus(status.getJobID()); > > The only NPE-generating part of this is that getClient() could return null, > but I'm not exactly sure what could cause that. We have some intermittent > problems with our job history server (returning "not found" for whatever > job it looks up) which could well be correlated to this, but I would expect > that to fail at the getJobStatus part rather than the getClient part. This > would, however, agree with the fact the job reports itself as SUCCEEDED > before it fails during the handleMultiPaths section (as perhaps the request > to check status there will get routed to the job history server). > > This happens with any Crunch jobs I try to run on this cluster, but there > are plenty of "plain old MapReduce" running on this cluster with no issues, > so I'm struggling to find reasons why Crunch would fail where the others > are succeeding. > > Thanks, > David > --001a113b6976bcda2c05235460bd--