crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristoffer Sjögren <sto...@gmail.com>
Subject Re: CDH5
Date Thu, 12 Jun 2014 06:59:16 GMT
Yes, a pseudo distributed CDH5, but I realize now that I haven't
installed the apt packages for crunch. Im using the DistCache to
upload crunch-core-0.9.0-cdh5.0.0.jar instead. Does it matter?

One thing i noticed is that you're running
hadoop-client-2.3.0-cdh5.0.0 whereas i'm using
hadoop-yarn-client-2.3.0-cdh5.0.0. Also when I try to install crunch
using apt I see that it depends on hadoop-0.20-mapreduce and
hadoop-client.

I may be confused but I thought that yarn would be backward compatible
with mrv1?

On Wed, Jun 11, 2014 at 6:41 PM, Josh Wills <jwills@cloudera.com> wrote:
> Hey Kristoffer,
>
> Couldn't reproduce that in my crunch-demo project against my test cluster:
>
> https://github.com/jwills/crunch-demo/tree/cdh5
>
> So I hate asking dumb questions, but are you running against a CDH5 cluster?
>
> J
>
>
> On Wed, Jun 11, 2014 at 9:11 AM, Josh Wills <josh.wills@gmail.com> wrote:
>>
>> That's very odd; let me see if I can reproduce it.
>>
>> J
>>
>>
>> On Wed, Jun 11, 2014 at 7:23 AM, Kristoffer Sjögren <stoffe@gmail.com>
>> wrote:
>>>
>>> Hi
>>>
>>> Im trying out Crunch on YARN on CDH5 (0.9.0-cdh5.0.0) and get some
>>> errors when trying to materialize results (see below). The job itself
>>> is super simple.
>>>
>>> PCollection<String> lines = pipeline.read(new TextFileSource<String>(
>>>     new Path("hdfs://....log"), Writables.strings()));
>>>
>>> lines = lines.parallelDo(new DoFn<String, String>() {
>>>   @Override
>>>   public void process(String s, Emitter<String> e) {
>>>     e.emit(s);
>>>   }
>>> }, Writables.strings());
>>>
>>> for (String line : lines.materialize()) {
>>>   System.out.println(line);
>>> }
>>>
>>>
>>> Seems like there's some kind of sync issue here because I can see the
>>> "correct" tmp dir in hdfs. Note that the p index is "p2" in hdfs while
>>> the client looks for "p1".
>>>
>>> -rw-r--r--   1 kristoffersjogren supergroup       1748 2014-06-11
>>> 15:36 /tmp/crunch-134908575/p2/MAP
>>> drwxr-xr-x   - kristoffersjogren supergroup          0 2014-06-11
>>> 15:36 /tmp/crunch-134908575/p2/output
>>> -rw-r--r--   1 kristoffersjogren supergroup          0 2014-06-11
>>> 15:36 /tmp/crunch-134908575/p2/output/_SUCCESS
>>> -rw-r--r--   1 kristoffersjogren supergroup   42898831 2014-06-11
>>> 15:36 /tmp/crunch-134908575/p2/output/out0-m-00000
>>> -rw-r--r--   1 kristoffersjogren supergroup          0 2014-06-11
>>> 15:36 /tmp/crunch-134908575/p2/output/part-m-00000
>>>
>>>
>>> If I try to write directly to HDFS using the following, the job finish
>>> successfully, but nothing is written instead?
>>>
>>> pipeline.write(lines, new TextFileSourceTarget<String>("/user/stoffe",
>>> Writables.strings()), WriteMode.OVERWRITE);
>>>
>>>
>>> Any ideas of what might go wrong?
>>>
>>> Cheers,
>>> -Kristoffer
>>>
>>>
>>>
>>> Exception in thread "main" java.lang.RuntimeException:
>>> org.apache.crunch.CrunchRuntimeException: java.io.IOException: No
>>> files found to materialize at: /tmp/crunch-1611606737/p1
>>> at mapred.CrunchJob.<init>(CrunchJob.java:36)
>>> at mapred.tempjobs.DownloadFiles.<init>(DownloadFiles.java:16)
>>> at mapred.tempjobs.DownloadFiles.main(DownloadFiles.java:20)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:483)
>>> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
>>> Caused by: org.apache.crunch.CrunchRuntimeException:
>>> java.io.IOException: No files found to materialize at:
>>> /tmp/crunch-1611606737/p1
>>> at
>>> org.apache.crunch.materialize.MaterializableIterable.materialize(MaterializableIterable.java:79)
>>> at
>>> org.apache.crunch.materialize.MaterializableIterable.iterator(MaterializableIterable.java:69)
>>> at mapred.tempjobs.DownloadFiles.run(DownloadFiles.java:37)
>>> at mapred.CrunchJob.run(CrunchJob.java:96)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>> at mapred.CrunchJob.<init>(CrunchJob.java:34)
>>> ... 7 more
>>> Caused by: java.io.IOException: No files found to materialize at:
>>> /tmp/crunch-1611606737/p1
>>> at
>>> org.apache.crunch.io.CompositePathIterable.create(CompositePathIterable.java:49)
>>> at org.apache.crunch.io.impl.FileSourceImpl.read(FileSourceImpl.java:136)
>>> at org.apache.crunch.io.seq.SeqFileSource.read(SeqFileSource.java:43)
>>> at
>>> org.apache.crunch.io.impl.ReadableSourcePathTargetImpl.read(ReadableSourcePathTargetImpl.java:37)
>>> at
>>> org.apache.crunch.materialize.MaterializableIterable.materialize(MaterializableIterable.java:76)
>>> ... 12 more
>>
>>
>
>
>
> --
> Director of Data Science
> Cloudera
> Twitter: @josh_wills

Mime
View raw message