crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Whiting <davidwhit...@gmail.com>
Subject Re: NullPointerExceptions in handleMultiPaths CompletionHook
Date Wed, 04 Nov 2015 14:57:51 GMT
Pretty sure this was just tied up with our job history server issues. We've
fixed those and Crunch seems to be happily crunching again now :-)

On 3 November 2015 at 12:01, David Whiting <davidwhiting@gmail.com> wrote:

> Different problem if I try that :-(
>
> 15/11/03 10:54:06 INFO mapred.ClientServiceDelegate: Application state is
> completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history
> server
> 1 job failure(s) occurred:
> 15/11/03 10:54:16 ERROR exec.MRExecutor: Pipeline failed due to exception
> java.lang.NullPointerException
>         at org.apache.hadoop.mapreduce.Job.getJobName(Job.java:442)
>         at
> org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.getJobName(CrunchControlledJob.java:131)
>         at
> org.apache.crunch.impl.mr.exec.MRExecutor.monitorLoop(MRExecutor.java:140)
>         at
> org.apache.crunch.impl.mr.exec.MRExecutor.access$000(MRExecutor.java:58)
>         at
> org.apache.crunch.impl.mr.exec.MRExecutor$1.run(MRExecutor.java:90)
>         at java.lang.Thread.run(Thread.java:745)
>
>
>
> On 30 October 2015 at 16:49, Josh Wills <josh.wills@gmail.com> wrote:
>
>> David! Welcome back!
>>
>> I haven't hit that one before; if you tweak handleMultiPaths to look like
>> the below, does it fix the issue?
>>
>> J
>>
>> private synchronized void handleMultiPaths(MRJob job) throws IOException {
>>   try {
>>     if (job.getJobState() == MRJob.State.SUCCESS) {
>>       if (!multiPaths.isEmpty()) {
>>         for (Map.Entry<Integer, PathTarget> entry :
>> multiPaths.entrySet()) {
>>           entry.getValue().handleOutputs(job.getJob().getConfiguration(),
>> workingPath, entry.getKey());
>>         }
>>       }
>>     }  } catch(Exception ie) {
>>     throw new IOException(ie);
>>   }
>>
>> }
>>
>>
>> On Fri, Oct 30, 2015 at 8:21 AM, David Whiting <davidwhiting@gmail.com>
>> wrote:
>>
>> > Hi everybody! I'm back and pushing Crunch in a new organisation
>> >
>> > I'm having some strange non-deterministic problems with the end of my
>> > Crunch job executions in a new environment - I've got some possible
>> ideas
>> > as to why it's happening, but no good ideas for workarounds so I was
>> hoping
>> > somebody might be able to help me out. Basically, this is what it looks
>> > like:
>> >
>> > 15/10/30 15:01:55 INFO jobcontrol.CrunchControlledJob: Running job
>> > "crunching.CountEventsByType: SeqFile([{REDACTED}... ID=1 (1/1)"
>> > 15/10/30 15:01:55 INFO jobcontrol.CrunchControlledJob: Job status
>> available
>> > at: {REDACTED}/proxy/application_1443106319465_13029/
>> > 15/10/30 15:05:02 INFO ipc.Client: Retrying connect to server:
>> {REDACTED}.
>> > Already tried 0 time(s); retry policy is
>> > RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000
>> > MILLISECONDS)
>> > 15/10/30 15:05:03 INFO ipc.Client: Retrying connect to server:
>> {REDACTED}.
>> > Already tried 1 time(s); retry policy is
>> > RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000
>> > MILLISECONDS)
>> > 15/10/30 15:05:04 INFO ipc.Client: Retrying connect to server:
>> {REDACTED}.
>> > Already tried 2 time(s); retry policy is
>> > RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000
>> > MILLISECONDS)
>> > 15/10/30 15:05:04 INFO mapred.ClientServiceDelegate: Application state
>> is
>> > completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history
>> > server
>> > 15/10/30 15:05:04 ERROR exec.MRExecutor: Pipeline failed due to
>> exception
>> > java.io.IOException: java.lang.NullPointerException
>> >         at
>> >
>> >
>> org.apache.crunch.impl.mr.exec.CrunchJobHooks$CompletionHook.handleMultiPaths(CrunchJobHooks.java:99)
>> >         at
>> >
>> >
>> org.apache.crunch.impl.mr.exec.CrunchJobHooks$CompletionHook.run(CrunchJobHooks.java:86)
>> >         at
>> >
>> >
>> org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.checkRunningState(CrunchControlledJob.java:288)
>> >         at
>> >
>> >
>> org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.checkState(CrunchControlledJob.java:299)
>> >         at
>> >
>> >
>> org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.checkRunningJobs(CrunchJobControl.java:201)
>> >         at
>> >
>> >
>> org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.pollJobStatusAndStartNewOnes(CrunchJobControl.java:321)
>> >         at
>> >
>> org.apache.crunch.impl.mr.exec.MRExecutor.monitorLoop(MRExecutor.java:131)
>> >         at
>> > org.apache.crunch.impl.mr.exec.MRExecutor.access$000(MRExecutor.java:58)
>> >         at
>> > org.apache.crunch.impl.mr.exec.MRExecutor$1.run(MRExecutor.java:90)
>> >         at java.lang.Thread.run(Thread.java:745)
>> > Caused by: java.lang.NullPointerException
>> >         at org.apache.hadoop.mapreduce.Job$1.run(Job.java:325)
>> >         at org.apache.hadoop.mapreduce.Job$1.run(Job.java:322)
>> >         at java.security.AccessController.doPrivileged(Native Method)
>> >         at javax.security.auth.Subject.doAs(Subject.java:422)
>> >         at
>> >
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>> >         at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:322)
>> >         at org.apache.hadoop.mapreduce.Job.isSuccessful(Job.java:632)
>> >         at
>> >
>> >
>> org.apache.crunch.impl.mr.exec.CrunchJobHooks$CompletionHook.handleMultiPaths(CrunchJobHooks.java:91)
>> >         ... 9 more
>> >
>> > The corresponding line in the Hadoop source is this:
>> >
>> > return cluster.getClient().getJobStatus(status.getJobID());
>> >
>> > The only NPE-generating part of this is that getClient() could return
>> null,
>> > but I'm not exactly sure what could cause that. We have some
>> intermittent
>> > problems with our job history server (returning "not found" for whatever
>> > job it looks up) which could well be correlated to this, but I would
>> expect
>> > that to fail at the getJobStatus part rather than the getClient part.
>> This
>> > would, however, agree with the fact the job reports itself as SUCCEEDED
>> > before it fails during the handleMultiPaths section (as perhaps the
>> request
>> > to check status there will get routed to the job history server).
>> >
>> > This happens with any Crunch jobs I try to run on this cluster, but
>> there
>> > are plenty of "plain old MapReduce" running on this cluster with no
>> issues,
>> > so I'm struggling to find reasons why Crunch would fail where the others
>> > are succeeding.
>> >
>> > Thanks,
>> > David
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message