crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Whiting <davidwhit...@gmail.com>
Subject NullPointerExceptions in handleMultiPaths CompletionHook
Date Fri, 30 Oct 2015 15:21:56 GMT
Hi everybody! I'm back and pushing Crunch in a new organisation

I'm having some strange non-deterministic problems with the end of my
Crunch job executions in a new environment - I've got some possible ideas
as to why it's happening, but no good ideas for workarounds so I was hoping
somebody might be able to help me out. Basically, this is what it looks
like:

15/10/30 15:01:55 INFO jobcontrol.CrunchControlledJob: Running job
"crunching.CountEventsByType: SeqFile([{REDACTED}... ID=1 (1/1)"
15/10/30 15:01:55 INFO jobcontrol.CrunchControlledJob: Job status available
at: {REDACTED}/proxy/application_1443106319465_13029/
15/10/30 15:05:02 INFO ipc.Client: Retrying connect to server: {REDACTED}.
Already tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000
MILLISECONDS)
15/10/30 15:05:03 INFO ipc.Client: Retrying connect to server: {REDACTED}.
Already tried 1 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000
MILLISECONDS)
15/10/30 15:05:04 INFO ipc.Client: Retrying connect to server: {REDACTED}.
Already tried 2 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000
MILLISECONDS)
15/10/30 15:05:04 INFO mapred.ClientServiceDelegate: Application state is
completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history
server
15/10/30 15:05:04 ERROR exec.MRExecutor: Pipeline failed due to exception
java.io.IOException: java.lang.NullPointerException
        at
org.apache.crunch.impl.mr.exec.CrunchJobHooks$CompletionHook.handleMultiPaths(CrunchJobHooks.java:99)
        at
org.apache.crunch.impl.mr.exec.CrunchJobHooks$CompletionHook.run(CrunchJobHooks.java:86)
        at
org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.checkRunningState(CrunchControlledJob.java:288)
        at
org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.checkState(CrunchControlledJob.java:299)
        at
org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.checkRunningJobs(CrunchJobControl.java:201)
        at
org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.pollJobStatusAndStartNewOnes(CrunchJobControl.java:321)
        at
org.apache.crunch.impl.mr.exec.MRExecutor.monitorLoop(MRExecutor.java:131)
        at
org.apache.crunch.impl.mr.exec.MRExecutor.access$000(MRExecutor.java:58)
        at
org.apache.crunch.impl.mr.exec.MRExecutor$1.run(MRExecutor.java:90)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.mapreduce.Job$1.run(Job.java:325)
        at org.apache.hadoop.mapreduce.Job$1.run(Job.java:322)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:322)
        at org.apache.hadoop.mapreduce.Job.isSuccessful(Job.java:632)
        at
org.apache.crunch.impl.mr.exec.CrunchJobHooks$CompletionHook.handleMultiPaths(CrunchJobHooks.java:91)
        ... 9 more

The corresponding line in the Hadoop source is this:

return cluster.getClient().getJobStatus(status.getJobID());

The only NPE-generating part of this is that getClient() could return null,
but I'm not exactly sure what could cause that. We have some intermittent
problems with our job history server (returning "not found" for whatever
job it looks up) which could well be correlated to this, but I would expect
that to fail at the getJobStatus part rather than the getClient part. This
would, however, agree with the fact the job reports itself as SUCCEEDED
before it fails during the handleMultiPaths section (as perhaps the request
to check status there will get routed to the job history server).

This happens with any Crunch jobs I try to run on this cluster, but there
are plenty of "plain old MapReduce" running on this cluster with no issues,
so I'm struggling to find reasons why Crunch would fail where the others
are succeeding.

Thanks,
David

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message