crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Getting exception for a crunch job when running on Elastic Map Reduce, which runs fine locally
Date Tue, 09 Sep 2014 15:02:43 GMT
Hey,

Sorry I missed this-- I only really monitor the user@crunch.apache.org
mailing list closely. Which version of Crunch were you running when you got
the exception? The usual explanation for that kind of error is that an
upstream job failed, in which case you should see the error in the
JobHistory server. It's also possible that we're not handling S3/EMR
FileSystem stuff correctly, which happens sometimes, e.g.,

http://mail-archives.apache.org/mod_mbox/crunch-user/201310.mbox/%3CCAHCsPn8pvqJ6aJWcEqk4R3YZ8gu_MYVig+pqFEgw-AfSAD277w@mail.gmail.com%3E

Josh

On Mon, Sep 8, 2014 at 8:19 AM, <fahdsiddiqui007@gmail.com> wrote:

> I am trying to get some directions on how to go about debugging this issue. I run my
crunch job on a local hadoop setup, and it works fine. I understand that the data is much
smaller, but the files that I am trying to diff have the same structure (dump from different
sources). The stack trace is as follows:
>
>
> No files found to materialize at: /tmp/crunch-1412622901/p4
> 	at org.apache.crunch.materialize.MaterializableIterable.materialize(MaterializableIterable.java:79)
> 	at org.apache.crunch.materialize.MaterializableIterable.iterator(MaterializableIterable.java:69)
> 	at org.apache.crunch.materialize.pobject.CollectionPObject.process(CollectionPObject.java:49)
> 	at org.apache.crunch.materialize.pobject.CollectionPObject.process(CollectionPObject.java:34)
> 	at org.apache.crunch.materialize.pobject.PObjectImpl.getValue(PObjectImpl.java:70)
> 	at <redacted>(BulkDiffCommand.java:126)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
>
> The line in my code that invokes the above calls the following method:
>
>
> tables.asCollection().getValue()
>
>
> Also, this error occurs after the job runs for a few hours, and then fails on one of
its subsequent jobs. I've looked at crunch source code, and CompositePath.create() method
basically couldn't find the path noted above. I am trying a few things out, but any ideas
on how to go about debugging this?
>
>
>
>  To unsubscribe from this group and stop receiving emails from it, send an
> email to crunch-users+unsubscribe@cloudera.org.
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message