incubator-crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gauthier AMBARD <gauthier.amb...@gmail.com>
Subject Re: CrunchRuntimeException: java.io.IOException
Date Wed, 25 Jul 2012 10:02:26 GMT
Hi

Well I have made a new install from scratch and it works just fine now
(with and without modifying the pom-parent) ! (As the HDFS acted oddly
before the re-install, it may have caused the error.)

Thanks for the help !
Gauthier


2012/7/25 Josh Wills <jwills@cloudera.com>

> Hey Gauthier,
>
> I ran this locally just now by executing the following sequence:
>
> 1) Changed the hadoop.version in the top-level crunch pom.xml to be 1.0.3.
> 2) Ran `mvn clean package`
> 3) cd examples/
> 4) ~/cdh/hadoop-1.0.3/bin/hadoop jar
> target/crunch-examples-0.3.0-SNAPSHOT-job.jar
> org.apache.crunch.examples.WordCount foo.txt out
>
> where I downloaded the version of hadoop you linked to in your previous
> email, and foo.txt was a local file I created for testing. Curious as to
> what (if anything) you did differently.
>
> J
>
> On Tue, Jul 24, 2012 at 8:54 AM, Josh Wills <jwills@cloudera.com> wrote:
>
>> Could be. I'm on the road today, but I'll take a look at it this evening.
>>
>>
>> On Tue, Jul 24, 2012 at 8:48 AM, Gauthier AMBARD <
>> gauthier.ambard@gmail.com> wrote:
>>
>>> Yep,
>>> http://apache.mirrors.multidist.eu/hadoop/common/stable/hadoop-1.0.3-bin.tar.gz
and
>>> hadoop version says :
>>> Hadoop 1.0.3
>>> Subversion
>>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
>>> 1335192
>>> Compiled by hortonfo on Tue May  8 20:31:25 UTC 2012
>>> From source with checksum e6b0c1e23dcf76907c5fecb4b832f3be
>>>
>>> Maybe it has to do with some configuration ?
>>>
>>> Gauthier
>>>
>>>
>>> 2012/7/24 Josh Wills <jwills@cloudera.com>
>>>
>>>> Hey Gauthier,
>>>>
>>>> IIRC, that error occurs when the Hadoop version doesn't support
>>>> multiple output files, which Crunch relies on. My understanding was that
>>>> this was part of 1.0.3, viz.
>>>>
>>>>
>>>> http://hadoop.apache.org/common/docs/r1.0.3/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
>>>>
>>>> so I'm a bit thrown-- this is the Apache distro of 1.0.3, right? Not a
>>>> custom Hadoop build?
>>>>
>>>> J
>>>>
>>>> On Tue, Jul 24, 2012 at 8:29 AM, Gauthier AMBARD <
>>>> gauthier.ambard@gmail.com> wrote:
>>>>
>>>>> Hi guys,
>>>>>
>>>>> I wanted to use crunch, but when I tried the examples I got
>>>>> : org.apache.crunch.impl.mr.run.CrunchRuntimeException:
>>>>> java.io.IOException: File already
>>>>> exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000
>>>>>
>>>>> I am running a git (apache incubator) version of crunch (07/24/2012)
>>>>> against a 1.0.3 hadoop (maybe this is causing the error,
>>>>> every dependencies are with 0.20.x hadoop). Or maybe I have messed with
my
>>>>> hadoop configuration (but I can run any hadoop example).
>>>>>
>>>>> Regards
>>>>> Gauthier
>>>>>
>>>>> Stack trace :
>>>>>
>>>>> 714  [Thread-15] INFO  org.apache.crunch.impl.mr.run.RTNode  - Crunch
>>>>> exception in 'Text(out)' for input: [(http://www.apache.org/).,1]
>>>>> org.apache.crunch.impl.mr.run.CrunchRuntimeException:
>>>>> java.io.IOException: File already
>>>>> exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000
>>>>> at
>>>>> org.apache.crunch.impl.mr.emit.MultipleOutputEmitter.emit(MultipleOutputEmitter.java:44)
>>>>>  at org.apache.crunch.MapFn.process(MapFn.java:34)
>>>>> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85)
>>>>>  at
>>>>> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43)
>>>>> at org.apache.crunch.MapFn.process(MapFn.java:34)
>>>>>  at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85)
>>>>> at
>>>>> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43)
>>>>>  at
>>>>> org.apache.crunch.CombineFn$AggregatorCombineFn.process(CombineFn.java:87)
>>>>> at
>>>>> org.apache.crunch.CombineFn$AggregatorCombineFn.process(CombineFn.java:72)
>>>>>  at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85)
>>>>> at
>>>>> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43)
>>>>>  at org.apache.crunch.MapFn.process(MapFn.java:34)
>>>>> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85)
>>>>>  at
>>>>> org.apache.crunch.impl.mr.run.RTNode.processIterable(RTNode.java:100)
>>>>> at
>>>>> org.apache.crunch.impl.mr.run.CrunchReducer.reduce(CrunchReducer.java:61)
>>>>>  at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>>>>> at
>>>>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
>>>>>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>>>>> at
>>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
>>>>> Caused by: java.io.IOException: File already
>>>>> exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000
>>>>> at
>>>>> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:228)
>>>>>  at
>>>>> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:335)
>>>>> at
>>>>> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:368)
>>>>>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484)
>>>>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:465)
>>>>>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:372)
>>>>> at
>>>>> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:128)
>>>>>  at
>>>>> org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.getRecordWriter(CrunchMultipleOutputs.java:416)
>>>>> at
>>>>> org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.write(CrunchMultipleOutputs.java:378)
>>>>>  at
>>>>> org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.write(CrunchMultipleOutputs.java:356)
>>>>> at
>>>>> org.apache.crunch.impl.mr.emit.MultipleOutputEmitter.emit(MultipleOutputEmitter.java:42)
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Director of Data Science
>>>> Cloudera <http://www.cloudera.com>
>>>> Twitter: @josh_wills <http://twitter.com/josh_wills>
>>>>
>>>>
>>>
>>
>>
>> --
>> Director of Data Science
>> Cloudera <http://www.cloudera.com>
>> Twitter: @josh_wills <http://twitter.com/josh_wills>
>>
>>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>
>

Mime
View raw message