Return-Path: X-Original-To: apmail-incubator-crunch-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-crunch-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 287D9DDFF for ; Wed, 25 Jul 2012 04:11:02 +0000 (UTC) Received: (qmail 86265 invoked by uid 500); 25 Jul 2012 04:11:02 -0000 Delivered-To: apmail-incubator-crunch-user-archive@incubator.apache.org Received: (qmail 86224 invoked by uid 500); 25 Jul 2012 04:11:01 -0000 Mailing-List: contact crunch-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: crunch-user@incubator.apache.org Delivered-To: mailing list crunch-user@incubator.apache.org Received: (qmail 86167 invoked by uid 99); 25 Jul 2012 04:11:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Jul 2012 04:11:00 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jwills@cloudera.com designates 209.85.212.47 as permitted sender) Received: from [209.85.212.47] (HELO mail-vb0-f47.google.com) (209.85.212.47) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Jul 2012 04:10:53 +0000 Received: by vbbfr13 with SMTP id fr13so223081vbb.6 for ; Tue, 24 Jul 2012 21:10:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=QTRpekqujpSLgF8sKv2emSmDZ6UGWsoJlzIa27x28Sg=; b=FgQH1Za6TIht7/os9Jc+00OUrOOIlTIvzKjl8lt2960OBkwoTcqktrq4TZsqhHEpnI R1hULK3skeW4pFdxbsO7MWmZIufYQYnH+YSEsAoXCVKan+sfz2Saru15vvG4BEPX9aYg mRdRU8Jpnp9/5fMSd5gSxMbDlHlgNGoSpKuQ5mmZOaLBZj5LJCh0mHVwooJOB3OllrgP kR43ul0MCczvd45IzUzBfL4c806cAyjlS0pIkmMwJ7xvsJhxGmvEy4gh3CbkQpYBr4g6 HS28g7MqQizKUpQu58/Dzzdp5EtsZwU5SCApocIe6E4bAM29m6ZTLiiwEd/IRophwSB0 jIuA== Received: by 10.220.220.132 with SMTP id hy4mr17070363vcb.33.1343189432589; Tue, 24 Jul 2012 21:10:32 -0700 (PDT) MIME-Version: 1.0 Received: by 10.59.10.42 with HTTP; Tue, 24 Jul 2012 21:10:12 -0700 (PDT) In-Reply-To: References: From: Josh Wills Date: Tue, 24 Jul 2012 21:10:12 -0700 Message-ID: Subject: Re: CrunchRuntimeException: java.io.IOException To: crunch-user@incubator.apache.org Content-Type: multipart/alternative; boundary=14dae9cfc7e0fede7a04c59fa8d4 X-Gm-Message-State: ALoCoQmblkMsXo4azoWV6tb7nIhg83Sok0XE7poQMhal5OCh3Qj9EBmVg6sQizq2+GtI2ESJYbrm --14dae9cfc7e0fede7a04c59fa8d4 Content-Type: text/plain; charset=ISO-8859-1 Hey Gauthier, I ran this locally just now by executing the following sequence: 1) Changed the hadoop.version in the top-level crunch pom.xml to be 1.0.3. 2) Ran `mvn clean package` 3) cd examples/ 4) ~/cdh/hadoop-1.0.3/bin/hadoop jar target/crunch-examples-0.3.0-SNAPSHOT-job.jar org.apache.crunch.examples.WordCount foo.txt out where I downloaded the version of hadoop you linked to in your previous email, and foo.txt was a local file I created for testing. Curious as to what (if anything) you did differently. J On Tue, Jul 24, 2012 at 8:54 AM, Josh Wills wrote: > Could be. I'm on the road today, but I'll take a look at it this evening. > > > On Tue, Jul 24, 2012 at 8:48 AM, Gauthier AMBARD < > gauthier.ambard@gmail.com> wrote: > >> Yep, >> http://apache.mirrors.multidist.eu/hadoop/common/stable/hadoop-1.0.3-bin.tar.gz and >> hadoop version says : >> Hadoop 1.0.3 >> Subversion >> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r >> 1335192 >> Compiled by hortonfo on Tue May 8 20:31:25 UTC 2012 >> From source with checksum e6b0c1e23dcf76907c5fecb4b832f3be >> >> Maybe it has to do with some configuration ? >> >> Gauthier >> >> >> 2012/7/24 Josh Wills >> >>> Hey Gauthier, >>> >>> IIRC, that error occurs when the Hadoop version doesn't support multiple >>> output files, which Crunch relies on. My understanding was that this was >>> part of 1.0.3, viz. >>> >>> >>> http://hadoop.apache.org/common/docs/r1.0.3/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html >>> >>> so I'm a bit thrown-- this is the Apache distro of 1.0.3, right? Not a >>> custom Hadoop build? >>> >>> J >>> >>> On Tue, Jul 24, 2012 at 8:29 AM, Gauthier AMBARD < >>> gauthier.ambard@gmail.com> wrote: >>> >>>> Hi guys, >>>> >>>> I wanted to use crunch, but when I tried the examples I got >>>> : org.apache.crunch.impl.mr.run.CrunchRuntimeException: >>>> java.io.IOException: File already >>>> exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000 >>>> >>>> I am running a git (apache incubator) version of crunch (07/24/2012) >>>> against a 1.0.3 hadoop (maybe this is causing the error, >>>> every dependencies are with 0.20.x hadoop). Or maybe I have messed with my >>>> hadoop configuration (but I can run any hadoop example). >>>> >>>> Regards >>>> Gauthier >>>> >>>> Stack trace : >>>> >>>> 714 [Thread-15] INFO org.apache.crunch.impl.mr.run.RTNode - Crunch >>>> exception in 'Text(out)' for input: [(http://www.apache.org/).,1] >>>> org.apache.crunch.impl.mr.run.CrunchRuntimeException: >>>> java.io.IOException: File already >>>> exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000 >>>> at >>>> org.apache.crunch.impl.mr.emit.MultipleOutputEmitter.emit(MultipleOutputEmitter.java:44) >>>> at org.apache.crunch.MapFn.process(MapFn.java:34) >>>> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85) >>>> at >>>> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43) >>>> at org.apache.crunch.MapFn.process(MapFn.java:34) >>>> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85) >>>> at >>>> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43) >>>> at >>>> org.apache.crunch.CombineFn$AggregatorCombineFn.process(CombineFn.java:87) >>>> at >>>> org.apache.crunch.CombineFn$AggregatorCombineFn.process(CombineFn.java:72) >>>> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85) >>>> at >>>> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43) >>>> at org.apache.crunch.MapFn.process(MapFn.java:34) >>>> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85) >>>> at >>>> org.apache.crunch.impl.mr.run.RTNode.processIterable(RTNode.java:100) >>>> at >>>> org.apache.crunch.impl.mr.run.CrunchReducer.reduce(CrunchReducer.java:61) >>>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) >>>> at >>>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566) >>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) >>>> at >>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) >>>> Caused by: java.io.IOException: File already >>>> exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000 >>>> at >>>> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:228) >>>> at >>>> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:335) >>>> at >>>> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:368) >>>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484) >>>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:465) >>>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:372) >>>> at >>>> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:128) >>>> at >>>> org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.getRecordWriter(CrunchMultipleOutputs.java:416) >>>> at >>>> org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.write(CrunchMultipleOutputs.java:378) >>>> at >>>> org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.write(CrunchMultipleOutputs.java:356) >>>> at >>>> org.apache.crunch.impl.mr.emit.MultipleOutputEmitter.emit(MultipleOutputEmitter.java:42) >>>> >>> >>> >>> >>> -- >>> Director of Data Science >>> Cloudera >>> Twitter: @josh_wills >>> >>> >> > > > -- > Director of Data Science > Cloudera > Twitter: @josh_wills > > -- Director of Data Science Cloudera Twitter: @josh_wills --14dae9cfc7e0fede7a04c59fa8d4 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hey Gauthier,

I ran this locally just now by executing t= he following sequence:

1) Changed the hadoop.versi= on in the top-level crunch pom.xml to be 1.0.3.
2) Ran `mvn clean= package`
3) cd examples/
4) ~/cdh/hadoop-1.0.3/bin/hadoop jar target/= crunch-examples-0.3.0-SNAPSHOT-job.jar org.apache.crunch.examples.WordCount= foo.txt out

where I downloaded the version of hadoop you linked to = in your previous email, and foo.txt was a local file I created for testing.= Curious as to what (if anything) you did differently.

J

On Tue, Jul = 24, 2012 at 8:54 AM, Josh Wills <jwills@cloudera.com> wrot= e:
Could be. I'm on the road today, but I'll take a look at it this ev= ening.


On Tue, Jul 24, 2012 at = 8:48 AM, Gauthier AMBARD <gauthier.ambard@gmail.com>= wrote:
Yep,=A0http://apache.mirrors.multidist.eu/hadoop/common/stable/hadoop-1.0.3= -bin.tar.gz=A0and hadoop version says :=A0
Hadoop 1.0.3
Compiled by hortonfo on Tue May =A08 20:31:25 UTC 2012
From source with checksum e6b0c1e23dcf76907c5fecb4b832f3be
<= br>
Maybe it has to do with some configuration ?

Gauthier


2012/7/24 Josh Wills <jwills@clo= udera.com>
Hey Gauthier,

IIRC, that = error occurs when the Hadoop version doesn't support multiple output fi= les, which Crunch relies on. My understanding was that this was part of 1.0= .3, viz.


so I'm a bit thrown-- this is the Apache distro of 1.0.3= , right? Not a custom Hadoop build?

J

On Tue, Jul 24, 2012 at 8:29 AM, Gau= thier AMBARD <gauthier.ambard@gmail.com> wrote:
Hi guys,

I wanted to use crunch, but when I tried the examples I got := =A0org.apache.crunch.impl.mr.run.CrunchRuntimeException: java.io.IOExceptio= n: File already exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_at= tempt_local_0001_r_000000_0/part-r-00000

I am running a git (apache incubator) version of crunch (07/24/2012) agains= t a 1.0.3 hadoop (maybe this is causing the error, every=A0dependencies=A0a= re with 0.20.x hadoop). Or maybe I have messed with my hadoop configuration= (but I can run any hadoop example).

Regards
Gauthier

Stack trace :

714 =A0[Thread-15] INFO =A0org.apache.crunch.impl.mr.run= .RTNode =A0- Crunch exception in 'Text(out)' for input: [(http://www.apache.org/).,1]
org.apache.crunch.impl.mr.run.CrunchRuntimeException: java.io.IOExcept= ion: File already exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_= attempt_local_0001_r_000000_0/part-r-00000
at org.apache.crunch.impl.mr.emit.MultipleOutputEmit= ter.emit(MultipleOutputEmitter.java:44)
at org.apache.crunch.MapF= n.process(MapFn.java:34)
at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85)
at org.apache.crunch.impl.mr.e= mit.IntermediateEmitter.emit(IntermediateEmitter.java:43)
at org.apache.crunch.MapFn.process(Ma= pFn.java:34)
at org.apache.crunch.impl= .mr.run.RTNode.process(RTNode.java:85)
at org.apache.crunch.impl.mr.emit.IntermediateEmitter.em= it(IntermediateEmitter.java:43)
at org.apache.crunch.Comb= ineFn$AggregatorCombineFn.process(CombineFn.java:87)
at org.apache.crunch.CombineFn$Aggregator= CombineFn.process(CombineFn.java:72)
at org.apache.crunch.impl= .mr.run.RTNode.process(RTNode.java:85)
at org.apache.crunch.impl.mr.emit.IntermediateEmitter.em= it(IntermediateEmitter.java:43)
at org.apache.crunch.MapF= n.process(MapFn.java:34)
at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85)
at org.apache.crunch.impl.mr.r= un.RTNode.processIterable(RTNode.java:100)
at org.apache.crunch.impl.mr.run.CrunchReducer.reduc= e(CrunchReducer.java:61)
at org.apache.hadoop.mapr= educe.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(Reduce= Task.java:566)
at org.apache.hadoop.mapr= ed.ReduceTask.run(ReduceTask.java:408)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(Local= JobRunner.java:216)
Caused by: java.io.IOException: File already exists:file:/tmp/crunch-1= 094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000<= /div>
at org.apache.hadoop= .fs.RawLocalFileSystem.create(RawLocalFileSystem.java:228)
at org.apache.hadoop.fs.C= hecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.ja= va:335)
at org.apach= e.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:368)
at org.apache.hadoop.fs.F= ileSystem.create(FileSystem.java:484)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java= :465)
at org.apache.hadoop.fs.F= ileSystem.create(FileSystem.java:372)
at org.apache.hadoop.mapreduce.lib.output.TextOutputForma= t.getRecordWriter(TextOutputFormat.java:128)
at org.apache.crunch.hado= op.mapreduce.lib.output.CrunchMultipleOutputs.getRecordWriter(CrunchMultipl= eOutputs.java:416)
a= t org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.write= (CrunchMultipleOutputs.java:378)
at org.apache.crunch.hado= op.mapreduce.lib.output.CrunchMultipleOutputs.write(CrunchMultipleOutputs.j= ava:356)
at org.apac= he.crunch.impl.mr.emit.MultipleOutputEmitter.emit(MultipleOutputEmitter.jav= a:42)



<= font color=3D"#888888">--
Director of Data Science
Twitter: @= josh_wills





--
Directo= r of Data Science
Twitter: @josh_wills




--
=
Director of Data Science
Twitter: @josh_wills

--14dae9cfc7e0fede7a04c59fa8d4--