crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristoffer Sjögren <sto...@gmail.com>
Subject Re: CDH5
Date Thu, 12 Jun 2014 07:56:42 GMT
Ok, so I got it working now after doing apt install crunch on the name
node. Not really sure why it fixed the problem tough?

And i'm submitting the job using the yarn client with following dependencies.

    <dependency>
      <groupId>org.apache.crunch</groupId>
      <artifactId>crunch-core</artifactId>
      <version>0.9.0-cdh5.0.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-yarn-client</artifactId>
      <version>2.3.0-cdh5.0.0</version>
    </dependency>


On Thu, Jun 12, 2014 at 8:59 AM, Kristoffer Sjögren <stoffe@gmail.com> wrote:
> Yes, a pseudo distributed CDH5, but I realize now that I haven't
> installed the apt packages for crunch. Im using the DistCache to
> upload crunch-core-0.9.0-cdh5.0.0.jar instead. Does it matter?
>
> One thing i noticed is that you're running
> hadoop-client-2.3.0-cdh5.0.0 whereas i'm using
> hadoop-yarn-client-2.3.0-cdh5.0.0. Also when I try to install crunch
> using apt I see that it depends on hadoop-0.20-mapreduce and
> hadoop-client.
>
> I may be confused but I thought that yarn would be backward compatible
> with mrv1?
>
> On Wed, Jun 11, 2014 at 6:41 PM, Josh Wills <jwills@cloudera.com> wrote:
>> Hey Kristoffer,
>>
>> Couldn't reproduce that in my crunch-demo project against my test cluster:
>>
>> https://github.com/jwills/crunch-demo/tree/cdh5
>>
>> So I hate asking dumb questions, but are you running against a CDH5 cluster?
>>
>> J
>>
>>
>> On Wed, Jun 11, 2014 at 9:11 AM, Josh Wills <josh.wills@gmail.com> wrote:
>>>
>>> That's very odd; let me see if I can reproduce it.
>>>
>>> J
>>>
>>>
>>> On Wed, Jun 11, 2014 at 7:23 AM, Kristoffer Sjögren <stoffe@gmail.com>
>>> wrote:
>>>>
>>>> Hi
>>>>
>>>> Im trying out Crunch on YARN on CDH5 (0.9.0-cdh5.0.0) and get some
>>>> errors when trying to materialize results (see below). The job itself
>>>> is super simple.
>>>>
>>>> PCollection<String> lines = pipeline.read(new TextFileSource<String>(
>>>>     new Path("hdfs://....log"), Writables.strings()));
>>>>
>>>> lines = lines.parallelDo(new DoFn<String, String>() {
>>>>   @Override
>>>>   public void process(String s, Emitter<String> e) {
>>>>     e.emit(s);
>>>>   }
>>>> }, Writables.strings());
>>>>
>>>> for (String line : lines.materialize()) {
>>>>   System.out.println(line);
>>>> }
>>>>
>>>>
>>>> Seems like there's some kind of sync issue here because I can see the
>>>> "correct" tmp dir in hdfs. Note that the p index is "p2" in hdfs while
>>>> the client looks for "p1".
>>>>
>>>> -rw-r--r--   1 kristoffersjogren supergroup       1748 2014-06-11
>>>> 15:36 /tmp/crunch-134908575/p2/MAP
>>>> drwxr-xr-x   - kristoffersjogren supergroup          0 2014-06-11
>>>> 15:36 /tmp/crunch-134908575/p2/output
>>>> -rw-r--r--   1 kristoffersjogren supergroup          0 2014-06-11
>>>> 15:36 /tmp/crunch-134908575/p2/output/_SUCCESS
>>>> -rw-r--r--   1 kristoffersjogren supergroup   42898831 2014-06-11
>>>> 15:36 /tmp/crunch-134908575/p2/output/out0-m-00000
>>>> -rw-r--r--   1 kristoffersjogren supergroup          0 2014-06-11
>>>> 15:36 /tmp/crunch-134908575/p2/output/part-m-00000
>>>>
>>>>
>>>> If I try to write directly to HDFS using the following, the job finish
>>>> successfully, but nothing is written instead?
>>>>
>>>> pipeline.write(lines, new TextFileSourceTarget<String>("/user/stoffe",
>>>> Writables.strings()), WriteMode.OVERWRITE);
>>>>
>>>>
>>>> Any ideas of what might go wrong?
>>>>
>>>> Cheers,
>>>> -Kristoffer
>>>>
>>>>
>>>>
>>>> Exception in thread "main" java.lang.RuntimeException:
>>>> org.apache.crunch.CrunchRuntimeException: java.io.IOException: No
>>>> files found to materialize at: /tmp/crunch-1611606737/p1
>>>> at mapred.CrunchJob.<init>(CrunchJob.java:36)
>>>> at mapred.tempjobs.DownloadFiles.<init>(DownloadFiles.java:16)
>>>> at mapred.tempjobs.DownloadFiles.main(DownloadFiles.java:20)
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> at java.lang.reflect.Method.invoke(Method.java:483)
>>>> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
>>>> Caused by: org.apache.crunch.CrunchRuntimeException:
>>>> java.io.IOException: No files found to materialize at:
>>>> /tmp/crunch-1611606737/p1
>>>> at
>>>> org.apache.crunch.materialize.MaterializableIterable.materialize(MaterializableIterable.java:79)
>>>> at
>>>> org.apache.crunch.materialize.MaterializableIterable.iterator(MaterializableIterable.java:69)
>>>> at mapred.tempjobs.DownloadFiles.run(DownloadFiles.java:37)
>>>> at mapred.CrunchJob.run(CrunchJob.java:96)
>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>>> at mapred.CrunchJob.<init>(CrunchJob.java:34)
>>>> ... 7 more
>>>> Caused by: java.io.IOException: No files found to materialize at:
>>>> /tmp/crunch-1611606737/p1
>>>> at
>>>> org.apache.crunch.io.CompositePathIterable.create(CompositePathIterable.java:49)
>>>> at org.apache.crunch.io.impl.FileSourceImpl.read(FileSourceImpl.java:136)
>>>> at org.apache.crunch.io.seq.SeqFileSource.read(SeqFileSource.java:43)
>>>> at
>>>> org.apache.crunch.io.impl.ReadableSourcePathTargetImpl.read(ReadableSourcePathTargetImpl.java:37)
>>>> at
>>>> org.apache.crunch.materialize.MaterializableIterable.materialize(MaterializableIterable.java:76)
>>>> ... 12 more
>>>
>>>
>>
>>
>>
>> --
>> Director of Data Science
>> Cloudera
>> Twitter: @josh_wills

Mime
View raw message