Return-Path: X-Original-To: apmail-incubator-crunch-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-crunch-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F014CD793 for ; Tue, 24 Jul 2012 15:55:06 +0000 (UTC) Received: (qmail 44191 invoked by uid 500); 24 Jul 2012 15:55:06 -0000 Delivered-To: apmail-incubator-crunch-user-archive@incubator.apache.org Received: (qmail 44164 invoked by uid 500); 24 Jul 2012 15:55:06 -0000 Mailing-List: contact crunch-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: crunch-user@incubator.apache.org Delivered-To: mailing list crunch-user@incubator.apache.org Received: (qmail 44135 invoked by uid 99); 24 Jul 2012 15:55:05 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Jul 2012 15:55:05 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jwills@cloudera.com designates 209.85.220.175 as permitted sender) Received: from [209.85.220.175] (HELO mail-vc0-f175.google.com) (209.85.220.175) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Jul 2012 15:54:59 +0000 Received: by vcbfy27 with SMTP id fy27so5725171vcb.6 for ; Tue, 24 Jul 2012 08:54:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=w29THqfUuY4M07z1rFHmOneuq33/ZCJTiMuPB5CoQbU=; b=NESIeykV0Y73R2R5/ZHXtO9ofTcq+9Kb0mmcHceSOPi9CYdE2qLSx/YyBMc14wDIkE UoDD2YirPvqbC9Eztx7ocmHIDCLC7HxSTPJHa1hPJ3ad4Ajf5mnck2vN/Hhs+hW7nwKg EoOlbSYmZXUqAoRUEd72btT9TczrFOJgGGKOTge6ivt2yhn9FJSFNZk3nSjWJ1NYmWxG NM7ShcvpQvtdU1KDu7U4niIPnVR8ZkFVcIcBohFdo0lvF5T9qAhjoBkPPB0Q3qj3oeFK lQhqcg7N8x1/SLe1xxRYFwz+idOUwhEX5fOUotUy9uTr3YK8hYl1oOwr0p/qnqbw/+WM ZLNg== Received: by 10.52.73.42 with SMTP id i10mr13844238vdv.116.1343145278009; Tue, 24 Jul 2012 08:54:38 -0700 (PDT) MIME-Version: 1.0 Received: by 10.59.10.42 with HTTP; Tue, 24 Jul 2012 08:54:17 -0700 (PDT) In-Reply-To: References: From: Josh Wills Date: Tue, 24 Jul 2012 08:54:17 -0700 Message-ID: Subject: Re: CrunchRuntimeException: java.io.IOException To: crunch-user@incubator.apache.org Content-Type: multipart/alternative; boundary=20cf3071c6642d6f3904c5956197 X-Gm-Message-State: ALoCoQlHamFPjEiz9+Y4zv3fxO2B7WY27hKTRRZPIEBzZrj0re3j6YJOcjBEsmC6Sy5MG7H+qXgr --20cf3071c6642d6f3904c5956197 Content-Type: text/plain; charset=ISO-8859-1 Could be. I'm on the road today, but I'll take a look at it this evening. On Tue, Jul 24, 2012 at 8:48 AM, Gauthier AMBARD wrote: > Yep, > http://apache.mirrors.multidist.eu/hadoop/common/stable/hadoop-1.0.3-bin.tar.gz and > hadoop version says : > Hadoop 1.0.3 > Subversion > https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r > 1335192 > Compiled by hortonfo on Tue May 8 20:31:25 UTC 2012 > From source with checksum e6b0c1e23dcf76907c5fecb4b832f3be > > Maybe it has to do with some configuration ? > > Gauthier > > > 2012/7/24 Josh Wills > >> Hey Gauthier, >> >> IIRC, that error occurs when the Hadoop version doesn't support multiple >> output files, which Crunch relies on. My understanding was that this was >> part of 1.0.3, viz. >> >> >> http://hadoop.apache.org/common/docs/r1.0.3/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html >> >> so I'm a bit thrown-- this is the Apache distro of 1.0.3, right? Not a >> custom Hadoop build? >> >> J >> >> On Tue, Jul 24, 2012 at 8:29 AM, Gauthier AMBARD < >> gauthier.ambard@gmail.com> wrote: >> >>> Hi guys, >>> >>> I wanted to use crunch, but when I tried the examples I got >>> : org.apache.crunch.impl.mr.run.CrunchRuntimeException: >>> java.io.IOException: File already >>> exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000 >>> >>> I am running a git (apache incubator) version of crunch (07/24/2012) >>> against a 1.0.3 hadoop (maybe this is causing the error, >>> every dependencies are with 0.20.x hadoop). Or maybe I have messed with my >>> hadoop configuration (but I can run any hadoop example). >>> >>> Regards >>> Gauthier >>> >>> Stack trace : >>> >>> 714 [Thread-15] INFO org.apache.crunch.impl.mr.run.RTNode - Crunch >>> exception in 'Text(out)' for input: [(http://www.apache.org/).,1] >>> org.apache.crunch.impl.mr.run.CrunchRuntimeException: >>> java.io.IOException: File already >>> exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000 >>> at >>> org.apache.crunch.impl.mr.emit.MultipleOutputEmitter.emit(MultipleOutputEmitter.java:44) >>> at org.apache.crunch.MapFn.process(MapFn.java:34) >>> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85) >>> at >>> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43) >>> at org.apache.crunch.MapFn.process(MapFn.java:34) >>> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85) >>> at >>> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43) >>> at >>> org.apache.crunch.CombineFn$AggregatorCombineFn.process(CombineFn.java:87) >>> at >>> org.apache.crunch.CombineFn$AggregatorCombineFn.process(CombineFn.java:72) >>> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85) >>> at >>> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43) >>> at org.apache.crunch.MapFn.process(MapFn.java:34) >>> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85) >>> at >>> org.apache.crunch.impl.mr.run.RTNode.processIterable(RTNode.java:100) >>> at >>> org.apache.crunch.impl.mr.run.CrunchReducer.reduce(CrunchReducer.java:61) >>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) >>> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566) >>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) >>> at >>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) >>> Caused by: java.io.IOException: File already >>> exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000 >>> at >>> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:228) >>> at >>> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:335) >>> at >>> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:368) >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484) >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:465) >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:372) >>> at >>> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:128) >>> at >>> org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.getRecordWriter(CrunchMultipleOutputs.java:416) >>> at >>> org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.write(CrunchMultipleOutputs.java:378) >>> at >>> org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.write(CrunchMultipleOutputs.java:356) >>> at >>> org.apache.crunch.impl.mr.emit.MultipleOutputEmitter.emit(MultipleOutputEmitter.java:42) >>> >> >> >> >> -- >> Director of Data Science >> Cloudera >> Twitter: @josh_wills >> >> > -- Director of Data Science Cloudera Twitter: @josh_wills --20cf3071c6642d6f3904c5956197 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Could be. I'm on the road today, but I'll take a look at it this ev= ening.

On Tue, Jul 24, 2012 at 8:48 AM, G= authier AMBARD <gauthier.ambard@gmail.com> wrote:
Yep,=A0http://apache.mirrors.multidist.eu/hadoop/common/stable/hadoop-1.0.3= -bin.tar.gz=A0and hadoop version says :=A0
Hadoop 1.0.3
Compiled by hortonfo on Tue May =A08 20:31:25 UTC 2012
From source with checksum e6b0c1e23dcf76907c5fecb4b832f3be
<= br>
Maybe it has to do with some configuration ?

Gauthier


2012/7/24 Josh Wills <= span dir=3D"ltr"><jwills@cloudera.com>
Hey Gauthier,

IIRC, that = error occurs when the Hadoop version doesn't support multiple output fi= les, which Crunch relies on. My understanding was that this was part of 1.0= .3, viz.


so I'm a bit thrown-- this is the Apache distro of 1.0.3= , right? Not a custom Hadoop build?

J

On Tue, Jul 24, 2012 at 8:29 AM, Gau= thier AMBARD <gauthier.ambard@gmail.com> wrote:
Hi guys,

I wanted to use crunch, but when I tried the examples I got := =A0org.apache.crunch.impl.mr.run.CrunchRuntimeException: java.io.IOExceptio= n: File already exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_at= tempt_local_0001_r_000000_0/part-r-00000

I am running a git (apache incubator) version of crunch (07/24/2012) agains= t a 1.0.3 hadoop (maybe this is causing the error, every=A0dependencies=A0a= re with 0.20.x hadoop). Or maybe I have messed with my hadoop configuration= (but I can run any hadoop example).

Regards
Gauthier

Stack trace :

714 =A0[Thread-15] INFO =A0org.apache.crunch.impl.mr.run= .RTNode =A0- Crunch exception in 'Text(out)' for input: [(http://www.apache.org/).,1]
org.apache.crunch.impl.mr.run.CrunchRuntimeException: java.io.IOExcept= ion: File already exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_= attempt_local_0001_r_000000_0/part-r-00000
at org.apache.crunch.impl.mr.emit.MultipleOutputEmit= ter.emit(MultipleOutputEmitter.java:44)
at org.apache.crunch.MapF= n.process(MapFn.java:34)
at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85)
at org.apache.crunch.impl.mr.e= mit.IntermediateEmitter.emit(IntermediateEmitter.java:43)
at org.apache.crunch.MapFn.process(Ma= pFn.java:34)
at org.apache.crunch.impl= .mr.run.RTNode.process(RTNode.java:85)
at org.apache.crunch.impl.mr.emit.IntermediateEmitter.em= it(IntermediateEmitter.java:43)
at org.apache.crunch.Comb= ineFn$AggregatorCombineFn.process(CombineFn.java:87)
at org.apache.crunch.CombineFn$Aggregator= CombineFn.process(CombineFn.java:72)
at org.apache.crunch.impl= .mr.run.RTNode.process(RTNode.java:85)
at org.apache.crunch.impl.mr.emit.IntermediateEmitter.em= it(IntermediateEmitter.java:43)
at org.apache.crunch.MapF= n.process(MapFn.java:34)
at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85)
at org.apache.crunch.impl.mr.r= un.RTNode.processIterable(RTNode.java:100)
at org.apache.crunch.impl.mr.run.CrunchReducer.reduc= e(CrunchReducer.java:61)
at org.apache.hadoop.mapr= educe.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(Reduce= Task.java:566)
at org.apache.hadoop.mapr= ed.ReduceTask.run(ReduceTask.java:408)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(Local= JobRunner.java:216)
Caused by: java.io.IOException: File already exists:file:/tmp/crunch-1= 094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000<= /div>
at org.apache.hadoop= .fs.RawLocalFileSystem.create(RawLocalFileSystem.java:228)
at org.apache.hadoop.fs.C= hecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.ja= va:335)
at org.apach= e.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:368)
at org.apache.hadoop.fs.F= ileSystem.create(FileSystem.java:484)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java= :465)
at org.apache.hadoop.fs.F= ileSystem.create(FileSystem.java:372)
at org.apache.hadoop.mapreduce.lib.output.TextOutputForma= t.getRecordWriter(TextOutputFormat.java:128)
at org.apache.crunch.hado= op.mapreduce.lib.output.CrunchMultipleOutputs.getRecordWriter(CrunchMultipl= eOutputs.java:416)
a= t org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.write= (CrunchMultipleOutputs.java:378)
at org.apache.crunch.hado= op.mapreduce.lib.output.CrunchMultipleOutputs.write(CrunchMultipleOutputs.j= ava:356)
at org.apac= he.crunch.impl.mr.emit.MultipleOutputEmitter.emit(MultipleOutputEmitter.jav= a:42)



<= font color=3D"#888888">--
Director of Data Science
Twitter: @= josh_wills





--
Directo= r of Data Science
Twitter: @josh_wills

--20cf3071c6642d6f3904c5956197--