crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: CrunchRuntimeException: java.io.IOException
Date Tue, 24 Jul 2012 15:54:17 GMT
Could be. I'm on the road today, but I'll take a look at it this evening.

On Tue, Jul 24, 2012 at 8:48 AM, Gauthier AMBARD
<gauthier.ambard@gmail.com>wrote:

> Yep,
> http://apache.mirrors.multidist.eu/hadoop/common/stable/hadoop-1.0.3-bin.tar.gz and
> hadoop version says :
> Hadoop 1.0.3
> Subversion
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
> 1335192
> Compiled by hortonfo on Tue May  8 20:31:25 UTC 2012
> From source with checksum e6b0c1e23dcf76907c5fecb4b832f3be
>
> Maybe it has to do with some configuration ?
>
> Gauthier
>
>
> 2012/7/24 Josh Wills <jwills@cloudera.com>
>
>> Hey Gauthier,
>>
>> IIRC, that error occurs when the Hadoop version doesn't support multiple
>> output files, which Crunch relies on. My understanding was that this was
>> part of 1.0.3, viz.
>>
>>
>> http://hadoop.apache.org/common/docs/r1.0.3/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
>>
>> so I'm a bit thrown-- this is the Apache distro of 1.0.3, right? Not a
>> custom Hadoop build?
>>
>> J
>>
>> On Tue, Jul 24, 2012 at 8:29 AM, Gauthier AMBARD <
>> gauthier.ambard@gmail.com> wrote:
>>
>>> Hi guys,
>>>
>>> I wanted to use crunch, but when I tried the examples I got
>>> : org.apache.crunch.impl.mr.run.CrunchRuntimeException:
>>> java.io.IOException: File already
>>> exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000
>>>
>>> I am running a git (apache incubator) version of crunch (07/24/2012)
>>> against a 1.0.3 hadoop (maybe this is causing the error,
>>> every dependencies are with 0.20.x hadoop). Or maybe I have messed with my
>>> hadoop configuration (but I can run any hadoop example).
>>>
>>> Regards
>>> Gauthier
>>>
>>> Stack trace :
>>>
>>> 714  [Thread-15] INFO  org.apache.crunch.impl.mr.run.RTNode  - Crunch
>>> exception in 'Text(out)' for input: [(http://www.apache.org/).,1]
>>> org.apache.crunch.impl.mr.run.CrunchRuntimeException:
>>> java.io.IOException: File already
>>> exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000
>>> at
>>> org.apache.crunch.impl.mr.emit.MultipleOutputEmitter.emit(MultipleOutputEmitter.java:44)
>>>  at org.apache.crunch.MapFn.process(MapFn.java:34)
>>> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85)
>>>  at
>>> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43)
>>> at org.apache.crunch.MapFn.process(MapFn.java:34)
>>>  at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85)
>>> at
>>> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43)
>>>  at
>>> org.apache.crunch.CombineFn$AggregatorCombineFn.process(CombineFn.java:87)
>>> at
>>> org.apache.crunch.CombineFn$AggregatorCombineFn.process(CombineFn.java:72)
>>>  at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85)
>>> at
>>> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43)
>>>  at org.apache.crunch.MapFn.process(MapFn.java:34)
>>> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85)
>>>  at
>>> org.apache.crunch.impl.mr.run.RTNode.processIterable(RTNode.java:100)
>>> at
>>> org.apache.crunch.impl.mr.run.CrunchReducer.reduce(CrunchReducer.java:61)
>>>  at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>>> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
>>>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>>> at
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
>>> Caused by: java.io.IOException: File already
>>> exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000
>>> at
>>> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:228)
>>>  at
>>> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:335)
>>> at
>>> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:368)
>>>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484)
>>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:465)
>>>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:372)
>>> at
>>> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:128)
>>>  at
>>> org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.getRecordWriter(CrunchMultipleOutputs.java:416)
>>> at
>>> org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.write(CrunchMultipleOutputs.java:378)
>>>  at
>>> org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.write(CrunchMultipleOutputs.java:356)
>>> at
>>> org.apache.crunch.impl.mr.emit.MultipleOutputEmitter.emit(MultipleOutputEmitter.java:42)
>>>
>>
>>
>>
>> --
>> Director of Data Science
>> Cloudera <http://www.cloudera.com>
>> Twitter: @josh_wills <http://twitter.com/josh_wills>
>>
>>
>


-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message