incubator-crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: CrunchRuntimeException: java.io.IOException
Date Wed, 25 Jul 2012 04:10:12 GMT
Hey Gauthier,

I ran this locally just now by executing the following sequence:

1) Changed the hadoop.version in the top-level crunch pom.xml to be 1.0.3.
2) Ran `mvn clean package`
3) cd examples/
4) ~/cdh/hadoop-1.0.3/bin/hadoop jar
target/crunch-examples-0.3.0-SNAPSHOT-job.jar
org.apache.crunch.examples.WordCount foo.txt out

where I downloaded the version of hadoop you linked to in your previous
email, and foo.txt was a local file I created for testing. Curious as to
what (if anything) you did differently.

J

On Tue, Jul 24, 2012 at 8:54 AM, Josh Wills <jwills@cloudera.com> wrote:

> Could be. I'm on the road today, but I'll take a look at it this evening.
>
>
> On Tue, Jul 24, 2012 at 8:48 AM, Gauthier AMBARD <
> gauthier.ambard@gmail.com> wrote:
>
>> Yep,
>> http://apache.mirrors.multidist.eu/hadoop/common/stable/hadoop-1.0.3-bin.tar.gz and
>> hadoop version says :
>> Hadoop 1.0.3
>> Subversion
>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
>> 1335192
>> Compiled by hortonfo on Tue May  8 20:31:25 UTC 2012
>> From source with checksum e6b0c1e23dcf76907c5fecb4b832f3be
>>
>> Maybe it has to do with some configuration ?
>>
>> Gauthier
>>
>>
>> 2012/7/24 Josh Wills <jwills@cloudera.com>
>>
>>> Hey Gauthier,
>>>
>>> IIRC, that error occurs when the Hadoop version doesn't support multiple
>>> output files, which Crunch relies on. My understanding was that this was
>>> part of 1.0.3, viz.
>>>
>>>
>>> http://hadoop.apache.org/common/docs/r1.0.3/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
>>>
>>> so I'm a bit thrown-- this is the Apache distro of 1.0.3, right? Not a
>>> custom Hadoop build?
>>>
>>> J
>>>
>>> On Tue, Jul 24, 2012 at 8:29 AM, Gauthier AMBARD <
>>> gauthier.ambard@gmail.com> wrote:
>>>
>>>> Hi guys,
>>>>
>>>> I wanted to use crunch, but when I tried the examples I got
>>>> : org.apache.crunch.impl.mr.run.CrunchRuntimeException:
>>>> java.io.IOException: File already
>>>> exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000
>>>>
>>>> I am running a git (apache incubator) version of crunch (07/24/2012)
>>>> against a 1.0.3 hadoop (maybe this is causing the error,
>>>> every dependencies are with 0.20.x hadoop). Or maybe I have messed with my
>>>> hadoop configuration (but I can run any hadoop example).
>>>>
>>>> Regards
>>>> Gauthier
>>>>
>>>> Stack trace :
>>>>
>>>> 714  [Thread-15] INFO  org.apache.crunch.impl.mr.run.RTNode  - Crunch
>>>> exception in 'Text(out)' for input: [(http://www.apache.org/).,1]
>>>> org.apache.crunch.impl.mr.run.CrunchRuntimeException:
>>>> java.io.IOException: File already
>>>> exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000
>>>> at
>>>> org.apache.crunch.impl.mr.emit.MultipleOutputEmitter.emit(MultipleOutputEmitter.java:44)
>>>>  at org.apache.crunch.MapFn.process(MapFn.java:34)
>>>> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85)
>>>>  at
>>>> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43)
>>>> at org.apache.crunch.MapFn.process(MapFn.java:34)
>>>>  at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85)
>>>> at
>>>> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43)
>>>>  at
>>>> org.apache.crunch.CombineFn$AggregatorCombineFn.process(CombineFn.java:87)
>>>> at
>>>> org.apache.crunch.CombineFn$AggregatorCombineFn.process(CombineFn.java:72)
>>>>  at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85)
>>>> at
>>>> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43)
>>>>  at org.apache.crunch.MapFn.process(MapFn.java:34)
>>>> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85)
>>>>  at
>>>> org.apache.crunch.impl.mr.run.RTNode.processIterable(RTNode.java:100)
>>>> at
>>>> org.apache.crunch.impl.mr.run.CrunchReducer.reduce(CrunchReducer.java:61)
>>>>  at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>>>> at
>>>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
>>>>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>>>> at
>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
>>>> Caused by: java.io.IOException: File already
>>>> exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000
>>>> at
>>>> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:228)
>>>>  at
>>>> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:335)
>>>> at
>>>> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:368)
>>>>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484)
>>>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:465)
>>>>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:372)
>>>> at
>>>> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:128)
>>>>  at
>>>> org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.getRecordWriter(CrunchMultipleOutputs.java:416)
>>>> at
>>>> org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.write(CrunchMultipleOutputs.java:378)
>>>>  at
>>>> org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.write(CrunchMultipleOutputs.java:356)
>>>> at
>>>> org.apache.crunch.impl.mr.emit.MultipleOutputEmitter.emit(MultipleOutputEmitter.java:42)
>>>>
>>>
>>>
>>>
>>> --
>>> Director of Data Science
>>> Cloudera <http://www.cloudera.com>
>>> Twitter: @josh_wills <http://twitter.com/josh_wills>
>>>
>>>
>>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>
>


-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message