crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <josh.wi...@gmail.com>
Subject Re: CDH5
Date Wed, 11 Jun 2014 16:11:05 GMT
That's very odd; let me see if I can reproduce it.

J


On Wed, Jun 11, 2014 at 7:23 AM, Kristoffer Sjögren <stoffe@gmail.com>
wrote:

> Hi
>
> Im trying out Crunch on YARN on CDH5 (0.9.0-cdh5.0.0) and get some
> errors when trying to materialize results (see below). The job itself
> is super simple.
>
> PCollection<String> lines = pipeline.read(new TextFileSource<String>(
>     new Path("hdfs://....log"), Writables.strings()));
>
> lines = lines.parallelDo(new DoFn<String, String>() {
>   @Override
>   public void process(String s, Emitter<String> e) {
>     e.emit(s);
>   }
> }, Writables.strings());
>
> for (String line : lines.materialize()) {
>   System.out.println(line);
> }
>
>
> Seems like there's some kind of sync issue here because I can see the
> "correct" tmp dir in hdfs. Note that the p index is "p2" in hdfs while
> the client looks for "p1".
>
> -rw-r--r--   1 kristoffersjogren supergroup       1748 2014-06-11
> 15:36 /tmp/crunch-134908575/p2/MAP
> drwxr-xr-x   - kristoffersjogren supergroup          0 2014-06-11
> 15:36 /tmp/crunch-134908575/p2/output
> -rw-r--r--   1 kristoffersjogren supergroup          0 2014-06-11
> 15:36 /tmp/crunch-134908575/p2/output/_SUCCESS
> -rw-r--r--   1 kristoffersjogren supergroup   42898831 2014-06-11
> 15:36 /tmp/crunch-134908575/p2/output/out0-m-00000
> -rw-r--r--   1 kristoffersjogren supergroup          0 2014-06-11
> 15:36 /tmp/crunch-134908575/p2/output/part-m-00000
>
>
> If I try to write directly to HDFS using the following, the job finish
> successfully, but nothing is written instead?
>
> pipeline.write(lines, new TextFileSourceTarget<String>("/user/stoffe",
> Writables.strings()), WriteMode.OVERWRITE);
>
>
> Any ideas of what might go wrong?
>
> Cheers,
> -Kristoffer
>
>
>
> Exception in thread "main" java.lang.RuntimeException:
> org.apache.crunch.CrunchRuntimeException: java.io.IOException: No
> files found to materialize at: /tmp/crunch-1611606737/p1
> at mapred.CrunchJob.<init>(CrunchJob.java:36)
> at mapred.tempjobs.DownloadFiles.<init>(DownloadFiles.java:16)
> at mapred.tempjobs.DownloadFiles.main(DownloadFiles.java:20)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
> Caused by: org.apache.crunch.CrunchRuntimeException:
> java.io.IOException: No files found to materialize at:
> /tmp/crunch-1611606737/p1
> at
> org.apache.crunch.materialize.MaterializableIterable.materialize(MaterializableIterable.java:79)
> at
> org.apache.crunch.materialize.MaterializableIterable.iterator(MaterializableIterable.java:69)
> at mapred.tempjobs.DownloadFiles.run(DownloadFiles.java:37)
> at mapred.CrunchJob.run(CrunchJob.java:96)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at mapred.CrunchJob.<init>(CrunchJob.java:34)
> ... 7 more
> Caused by: java.io.IOException: No files found to materialize at:
> /tmp/crunch-1611606737/p1
> at
> org.apache.crunch.io.CompositePathIterable.create(CompositePathIterable.java:49)
> at org.apache.crunch.io.impl.FileSourceImpl.read(FileSourceImpl.java:136)
> at org.apache.crunch.io.seq.SeqFileSource.read(SeqFileSource.java:43)
> at
> org.apache.crunch.io.impl.ReadableSourcePathTargetImpl.read(ReadableSourcePathTargetImpl.java:37)
> at
> org.apache.crunch.materialize.MaterializableIterable.materialize(MaterializableIterable.java:76)
> ... 12 more
>

Mime
View raw message