incubator-crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: CrunchRuntimeException: java.io.IOException
Date Tue, 24 Jul 2012 15:39:48 GMT
Hey Gauthier,

IIRC, that error occurs when the Hadoop version doesn't support multiple
output files, which Crunch relies on. My understanding was that this was
part of 1.0.3, viz.

http://hadoop.apache.org/common/docs/r1.0.3/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html

so I'm a bit thrown-- this is the Apache distro of 1.0.3, right? Not a
custom Hadoop build?

J

On Tue, Jul 24, 2012 at 8:29 AM, Gauthier AMBARD
<gauthier.ambard@gmail.com>wrote:

> Hi guys,
>
> I wanted to use crunch, but when I tried the examples I got
> : org.apache.crunch.impl.mr.run.CrunchRuntimeException:
> java.io.IOException: File already
> exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000
>
> I am running a git (apache incubator) version of crunch (07/24/2012)
> against a 1.0.3 hadoop (maybe this is causing the error,
> every dependencies are with 0.20.x hadoop). Or maybe I have messed with my
> hadoop configuration (but I can run any hadoop example).
>
> Regards
> Gauthier
>
> Stack trace :
>
> 714  [Thread-15] INFO  org.apache.crunch.impl.mr.run.RTNode  - Crunch
> exception in 'Text(out)' for input: [(http://www.apache.org/).,1]
> org.apache.crunch.impl.mr.run.CrunchRuntimeException: java.io.IOException:
> File already
> exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000
> at
> org.apache.crunch.impl.mr.emit.MultipleOutputEmitter.emit(MultipleOutputEmitter.java:44)
>  at org.apache.crunch.MapFn.process(MapFn.java:34)
> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85)
>  at
> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43)
> at org.apache.crunch.MapFn.process(MapFn.java:34)
>  at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85)
> at
> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43)
>  at
> org.apache.crunch.CombineFn$AggregatorCombineFn.process(CombineFn.java:87)
> at
> org.apache.crunch.CombineFn$AggregatorCombineFn.process(CombineFn.java:72)
>  at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85)
> at
> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43)
>  at org.apache.crunch.MapFn.process(MapFn.java:34)
> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85)
>  at org.apache.crunch.impl.mr.run.RTNode.processIterable(RTNode.java:100)
> at
> org.apache.crunch.impl.mr.run.CrunchReducer.reduce(CrunchReducer.java:61)
>  at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> Caused by: java.io.IOException: File already
> exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000
> at
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:228)
>  at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:335)
> at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:368)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:465)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:372)
> at
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:128)
>  at
> org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.getRecordWriter(CrunchMultipleOutputs.java:416)
> at
> org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.write(CrunchMultipleOutputs.java:378)
>  at
> org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.write(CrunchMultipleOutputs.java:356)
> at
> org.apache.crunch.impl.mr.emit.MultipleOutputEmitter.emit(MultipleOutputEmitter.java:42)
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message