Return-Path: X-Original-To: apmail-crunch-user-archive@www.apache.org Delivered-To: apmail-crunch-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 40086D08E for ; Wed, 15 May 2013 15:03:09 +0000 (UTC) Received: (qmail 20196 invoked by uid 500); 15 May 2013 15:03:09 -0000 Delivered-To: apmail-crunch-user-archive@crunch.apache.org Received: (qmail 20159 invoked by uid 500); 15 May 2013 15:03:09 -0000 Mailing-List: contact user-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@crunch.apache.org Delivered-To: mailing list user@crunch.apache.org Received: (qmail 20151 invoked by uid 99); 15 May 2013 15:03:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 May 2013 15:03:08 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of quentin.ambard@gmail.com designates 209.85.223.173 as permitted sender) Received: from [209.85.223.173] (HELO mail-ie0-f173.google.com) (209.85.223.173) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 May 2013 15:03:04 +0000 Received: by mail-ie0-f173.google.com with SMTP id k5so3908908iea.32 for ; Wed, 15 May 2013 08:02:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=sLROcE8t6itTUkrarIncIpF48norC3FfLL0KDKhRiOw=; b=qd4feVOyQOhQlDz6+yScC23k1MjTuZORh3F4ZamjLGki+c4lEwoBxen5CReR1We43M 0WZozHBwV2y6ns1eZHEKkZlRb1iSEp5VKQy0WHIXGGO/K8b7F+FEClI7Pu1keM4dvixQ GfLLNB/WXfEKyrbWAV57whmoRPw9TcZT3B8yyumeMyOyLVK9VRKK1ympXy3r/hjUg8bq DxjPal25retRWtx8e1lZ+dGaiGylV4uQWN2Mh/Fcg7IxJ769G8CpsYp7ZJrYkVkFM0Ww gt+B/ZpOnBqUqx//rhUZNUvyJQzj8ZE6cxXSWkKHlAbvyJS7xz1RQ2fiZz6JgIffkvpV YYbg== X-Received: by 10.43.68.134 with SMTP id xy6mr14691052icb.48.1368630164042; Wed, 15 May 2013 08:02:44 -0700 (PDT) MIME-Version: 1.0 Received: by 10.64.63.114 with HTTP; Wed, 15 May 2013 08:02:23 -0700 (PDT) In-Reply-To: References: From: Quentin Ambard Date: Wed, 15 May 2013 17:02:23 +0200 Message-ID: Subject: Re: mr.MRPipeline error running job : java.io.IOException: No such file or directory To: user Content-Type: multipart/alternative; boundary=bcaec51b206dc1bad404dcc30a8f X-Virus-Checked: Checked by ClamAV on apache.org --bcaec51b206dc1bad404dcc30a8f Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Well I finally find out.. My jar was owned by a different unix user than the one launching the job, that's why I had the error ! 2013/5/15 Quentin Ambard > Hi, > Thanks for your answers. > - crunch tmp dir permissions are fine, crunch create a new folder inside > everytime I launch the batch > - crunch example jar (wordcount) > - hbase connection is OK (I scan the table) > - I updated to crunch 0.6.0, logs are enabled and I have now more > information about the error. > It looks like it can't find some Hbase dependency jars to ship them to > the cluster. I think all necessary dependancies are packaged inside my ja= r. > All hadoop depandancies are from the cloudera repository (so I don't thin= k > it's a version issue). > > Any ideas ? > > Exception in thread "main" org.apache.crunch.CrunchRuntimeException: > java.io.IOException: java.lang.RuntimeException: java.io.IOException: No > such file or directory > at org.apache.crunch.impl.mr.MRPipeline.plan(MRPipeline.java:153) > at org.apache.crunch.impl.mr.MRPipeline.runAsync(MRPipeline.java:172) > at org.apache.crunch.impl.mr.MRPipeline.run(MRPipeline.java:160) > at org.apache.crunch.impl.mr.MRPipeline.done(MRPipeline.java:181) > at > com.myprocurement.crunch.job.extractor.ExtractAndConcatJob.run(ExtractAnd= ConcatJob.java:102) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at > com.myprocurement.crunch.job.fullpage.CrunchLauncher.launch(CrunchLaunche= r.java:40) > at com.myprocurement.crunch.BatchMain.main(BatchMain.java:31) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java= :39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI= mpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:208) > Caused by: java.io.IOException: java.lang.RuntimeException: > java.io.IOException: No such file or directory > at > org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.findOrCreateJar(Tabl= eMapReduceUtil.java:521) > at > org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.addDependencyJars(Ta= bleMapReduceUtil.java:472) > at > org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.addDependencyJars(Ta= bleMapReduceUtil.java:438) > at > org.apache.crunch.io.hbase.HBaseSourceTarget.configureSource(HBaseSourceT= arget.java:100) > at > org.apache.crunch.impl.mr.plan.JobPrototype.build(JobPrototype.java:192) > at > org.apache.crunch.impl.mr.plan.JobPrototype.getCrunchJob(JobPrototype.jav= a:123) > at org.apache.crunch.impl.mr.plan.MSCRPlanner.plan(MSCRPlanner.java:159) > at org.apache.crunch.impl.mr.MRPipeline.plan(MRPipeline.java:151) > ... 12 more > Caused by: java.lang.RuntimeException: java.io.IOException: No such file > or directory > at org.apache.hadoop.util.JarFinder.getJar(JarFinder.java:164) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI= mpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.findOrCreateJar(Tabl= eMapReduceUtil.java:518) > ... 19 more > Caused by: java.io.IOException: No such file or directory > at java.io.UnixFileSystem.createFileExclusively(Native Method) > at java.io.File.checkAndCreate(File.java:1705) > at java.io.File.createTempFile0(File.java:1726) > at java.io.File.createTempFile(File.java:1803) > at org.apache.hadoop.util.JarFinder.getJar(JarFinder.java:156) > ... 23 more > > > > > > > 2013/5/13 Josh Wills > >> It does sound like a permission issue-- you can set the crunch.tmp.dir >> property on the commandline (assuming you're implementing the Tool >> interface) by setting -Dcrunch.tmp.dir=3D... to see if that helps. >> >> >> On Mon, May 13, 2013 at 5:15 AM, Christian Tzolov wro= te: >> >>> You can try MRPipelien.enableDebug() to lower the log level. >>> >>> >>> On Mon, May 13, 2013 at 12:06 PM, Quentin Ambard < >>> quentin.ambard@gmail.com> wrote: >>> >>>> The problem is that I d'ont see my job on the JobTracker page. It's >>>> like the job don't even start ! >>>> Is there a way to improve log level to get more information on the >>>> error ? >>>> >>>> >>>> 2013/5/12 Josh Wills >>>> >>>>> Something probably failed in the MapReduce job itself, which meant >>>>> that there weren't any outputs for Crunch to move around. What do the= error >>>>> logs for the individual tasks look like on the JobTracker status page= (s)? >>>>> >>>>> >>>>> On Sat, May 11, 2013 at 5:02 PM, Quentin Ambard < >>>>> quentin.ambard@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> I'm running a simple job on hadoop cdh 4.1.2 based on crunch. >>>>>> The job is quite simple : it scan a hbase table, extract some data >>>>>> from each entry of hbase, group the result by key and combine them u= sing an >>>>>> aggreator, then write it back to another hbase table. >>>>>> It works fine on my computer, however when I try to launch it on my >>>>>> hadoop cluster I get the following : >>>>>> >>>>>> >>hadoop jar uber-crunch-1.0-SNAPSHOT.jar description >>>>>> /home/quentin/default.properties >>>>>> 13/05/12 01:57:50 INFO support.ClassPathXmlApplicationContext: >>>>>> Refreshing >>>>>> org.springframework.context.support.ClassPathXmlApplicationContext@1= f4384c2: >>>>>> startup date [Sun May 12 01:57:50 CEST 2013]; root of context hierar= chy >>>>>> 13/05/12 01:57:50 INFO xml.XmlBeanDefinitionReader: Loading XML bean >>>>>> definitions from class path resource [context/job-description-contex= t.xml] >>>>>> 13/05/12 01:57:50 INFO xml.XmlBeanDefinitionReader: Loading XML bean >>>>>> definitions from class path resource [context/default-batch-context.= xml] >>>>>> 13/05/12 01:57:51 INFO annotation.ClassPathBeanDefinitionScanner: >>>>>> JSR-330 'javax.inject.Named' annotation found and supported for comp= onent >>>>>> scanning >>>>>> 13/05/12 01:57:51 INFO config.PropertyPlaceholderConfigurer: Loading >>>>>> properties file from URL >>>>>> [file:/tmp/hadoop-hdfs/hadoop-unjar7637839123250781784/default.prope= rties] >>>>>> 13/05/12 01:57:51 INFO config.PropertyPlaceholderConfigurer: Loading >>>>>> properties file from URL >>>>>> [jar:file:/home/quentin/uber-crunch-1.0-SNAPSHOT.jar!/default.proper= ties] >>>>>> 13/05/12 01:57:51 INFO config.PropertyPlaceholderConfigurer: Loading >>>>>> properties file from URL [file:/home/quentin/default.properties] >>>>>> 13/05/12 01:57:51 INFO >>>>>> annotation.AutowiredAnnotationBeanPostProcessor: JSR-330 >>>>>> 'javax.inject.Inject' annotation found and supported for autowiring >>>>>> 13/05/12 01:57:51 INFO support.DefaultListableBeanFactory: >>>>>> Pre-instantiating singletons in >>>>>> org.springframework.beans.factory.support.DefaultListableBeanFactory= @5b7b0998: >>>>>> defining beans >>>>>> [org.springframework.beans.factory.config.PropertyPlaceholderConfigu= rer#0,applicationContextHolder,descriptionLauncher,descriptionExtractor,ema= ilExtractor,rawTextExtractor,keywordsExtractor,org.springframework.context.= annotation.internalConfigurationAnnotationProcessor,org.springframework.con= text.annotation.internalAutowiredAnnotationProcessor,org.springframework.co= ntext.annotation.internalRequiredAnnotationProcessor,org.springframework.co= ntext.annotation.internalCommonAnnotationProcessor,org.springframework.cont= ext.annotation.ConfigurationClassPostProcessor$ImportAwareBeanPostProcessor= #0]; >>>>>> root of factory hierarchy >>>>>> 13/05/12 01:57:52 INFO hbase.HBaseTarget: HBaseTarget ignores checks >>>>>> for existing outputs... >>>>>> 13/05/12 01:57:53 INFO collect.PGroupedTableImpl: Setting num reduce >>>>>> tasks to 2 >>>>>> 13/05/12 01:57:53 ERROR mr.MRPipeline: >>>>>> org.apache.crunch.CrunchRuntimeException: java.io.IOException: >>>>>> java.lang.RuntimeException: java.io.IOException: No such file or dir= ectory >>>>>> 13/05/12 01:57:53 WARN mr.MRPipeline: Not running cleanup while >>>>>> output targets remain >>>>>> >>>>>> Any idea of the origin of the problem ? Maybe it's something with >>>>>> permissions or a crunch tmp file, but I can't find out where it come= from >>>>>> >>>>>> Thanks for your help >>>>>> >>>>>> >>>>>> Quentin >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Director of Data Science >>>>> Cloudera >>>>> Twitter: @josh_wills >>>>> >>>> >>>> >>>> >>>> -- >>>> Quentin Ambard >>>> >>> >>> >> >> >> -- >> Director of Data Science >> Cloudera >> Twitter: @josh_wills >> > > > > -- > Quentin Ambard > --=20 Quentin Ambard --bcaec51b206dc1bad404dcc30a8f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Well I finally find out..
My jar was owned by a = different unix user than the one launching the job, that's why I had th= e error !


2013/5/15 Quentin Ambard <quentin.ambard@gmail.com>
Hi,
Thanks for your answers.
- crunch tmp di= r permissions are fine, crunch create a new folder inside everytime I launc= h the batch
- crunch example jar=A0=A0(wordcount)=A0
- hbase connection is OK (I scan the table)
- I updated to c= runch 0.6.0, logs are enabled and I have now more information about the err= or.=A0
It looks like it can't find some Hbase dependency jars= =A0to ship them to the cluster. I think all necessary dependancies are pac= kaged inside my jar. All hadoop depandancies are from the cloudera reposito= ry (so I don't think it's a version issue).

Any ideas ?

Exception in = thread "main" org.apache.crunch.CrunchRuntimeException: java.io.I= OException: java.lang.RuntimeException: java.io.IOException: No such file o= r directory
at org.apache.crunch.impl.mr.M= RPipeline.plan(MRPipeline.java:153)
at org.apache.crunch.impl.mr.MRPipeline.runAsync(MRPipeline= .java:172)
at org.apache.crunch.impl= .mr.MRPipeline.run(MRPipeline.java:160)
at org.apache.crunch.impl.mr.MRPipeline.done(MRPipeline= .java:181)
at com.myprocurement.crun= ch.job.extractor.ExtractAndConcatJob.run(ExtractAndConcatJob.java:102)
at org.apache.hadoop.uti= l.ToolRunner.run(ToolRunner.java:70)
at com.myprocurement.crun= ch.job.fullpage.CrunchLauncher.launch(CrunchLauncher.java:40)
at com.myprocurement.crunch.Batch= Main.main(BatchMain.java:31)
at sun.reflect.NativeMeth= odAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMeth= odAccessorImpl.java:39)
at sun.reflect.Delegating= MethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
<= span style=3D"white-space:pre-wrap"> at java.lang.reflect.Method.inv= oke(Method.java:597)
at org.apache.hadoop.util= .RunJar.main(RunJar.java:208)
Caused by: java.io.IOException: jav= a.lang.RuntimeException: java.io.IOException: No such file or directory
at org.apache.hadoop.hbas= e.mapreduce.TableMapReduceUtil.findOrCreateJar(TableMapReduceUtil.java:521)=
at org.apache.hadoo= p.hbase.mapreduce.TableMapReduceUtil.addDependencyJars(TableMapReduceUtil.j= ava:472)
at org.apache.hadoop.hbas= e.mapreduce.TableMapReduceUtil.addDependencyJars(TableMapReduceUtil.java:43= 8)
at org.apache.cru= nch.io.hbase.HBaseSourceTarget.configureSource(HBaseSourceTarget.java:100)<= /div>
at org.apache.crunch.impl= .mr.plan.JobPrototype.build(JobPrototype.java:192)
at org.apache.crunch.impl.mr.plan.JobPrototy= pe.getCrunchJob(JobPrototype.java:123)
at org.apache.crunch.impl= .mr.plan.MSCRPlanner.plan(MSCRPlanner.java:159)
at org.apache.crunch.impl.mr.MRPipeline.plan(MR= Pipeline.java:151)
... 12 more
Cau= sed by: java.lang.RuntimeException: java.io.IOException: No such file or di= rectory
at org.apach= e.hadoop.util.JarFinder.getJar(JarFinder.java:164)
at sun.reflect.GeneratedM= ethodAccessor4.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Delega= tingMethodAccessorImpl.java:25)
at java.lang.reflect.Meth= od.invoke(Method.java:597)
= at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.findOrCreate= Jar(TableMapReduceUtil.java:518)
... 19 more
Cau= sed by: java.io.IOException: No such file or directory
at java.io.UnixFileSystem.createFileExcl= usively(Native Method)
at java.io.File.checkAndC= reate(File.java:1705)
at java.io.File.createTempFile0(File.java:1726)
at java.io.File.createTempFile(File.java:1803= )
at org.apache.hadoop.util= .JarFinder.getJar(JarFinder.java:156)
... 23 more






2013/5/13 Josh Wills <= jwills@cloudera.com>
It does sound like a permis= sion issue-- you can set the crunch.tmp.dir property on the commandline (as= suming you're implementing the Tool interface) by setting -Dcrunch.tmp.= dir=3D... to see if that helps.


On Mon, May 1= 3, 2013 at 5:15 AM, Christian Tzolov <tzolov@apache.org> wro= te:
You can try MRPipelien.enab= leDebug() to lower the log level.=A0


On Mon, May 13, 2013 at 12:06 PM, Quenti= n Ambard <quentin.ambard@gmail.com> wrote:
The problem is that I d'ont see my job on the JobTracker page. It&= #39;s like the job don't even start !
Is there a way to improve log level to get more = information on the error ?

<= br>
2013/5/12 Josh Wills <jwills@cloudera.com<= /a>>
Something probably failed in the MapReduce job itself, whi= ch meant that there weren't any outputs for Crunch to move around. What= do the error logs for the individual tasks look like on the JobTracker sta= tus page(s)?


On = Sat, May 11, 2013 at 5:02 PM, Quentin Ambard <quentin.ambard@gmail= .com> wrote:
Hi,
I'm running a s= imple job on hadoop cdh 4.1.2 based on crunch.
The job is quite s= imple : it scan a hbase table, =A0extract some data from each entry of hbas= e, group the result by key and combine them using an aggreator, then write = it back to another hbase table.
It works fine on my computer, however when I try to launch it on my ha= doop cluster I get the following :

>>hadoop = jar uber-crunch-1.0-SNAPSHOT.jar description /home/quentin/default.properti= es
13/05/12 01:57:50 INFO support.ClassPathXmlApplicationContex= t: Refreshing org.springframework.context.support.ClassPathXmlApplicationCo= ntext@1f4384c2: startup date [Sun May 12 01:57:50 CEST 2013]; root of conte= xt hierarchy
13/05/12 01:57:50 INFO xml.XmlBeanDefinitionReader: Loading XML bean d= efinitions from class path resource [context/job-description-context.xml]
13/05/12 01:57:50 INFO xml.XmlBeanDefinitionReader: Loading XML be= an definitions from class path resource [context/default-batch-context.xml]=
13/05/12 01:57:51 INFO annotation.ClassPathBeanDefinitionScanner: JSR-= 330 'javax.inject.Named' annotation found and supported for compone= nt scanning
13/05/12 01:57:51 INFO config.PropertyPlaceholderConf= igurer: Loading properties file from URL [file:/tmp/hadoop-hdfs/hadoop-unja= r7637839123250781784/default.properties]
13/05/12 01:57:51 INFO config.PropertyPlaceholderConfigurer: Loading p= roperties file from URL [jar:file:/home/quentin/uber-crunch-1.0-SNAPSHOT.ja= r!/default.properties]
13/05/12 01:57:51 INFO config.PropertyPlac= eholderConfigurer: Loading properties file from URL [file:/home/quentin/def= ault.properties]
13/05/12 01:57:51 INFO annotation.AutowiredAnnotationBeanPostProcessor= : JSR-330 'javax.inject.Inject' annotation found and supported for = autowiring
13/05/12 01:57:51 INFO support.DefaultListableBeanFact= ory: Pre-instantiating singletons in org.springframework.beans.factory.supp= ort.DefaultListableBeanFactory@5b7b0998: defining beans [org.springframewor= k.beans.factory.config.PropertyPlaceholderConfigurer#0,applicationContextHo= lder,descriptionLauncher,descriptionExtractor,emailExtractor,rawTextExtract= or,keywordsExtractor,org.springframework.context.annotation.internalConfigu= rationAnnotationProcessor,org.springframework.context.annotation.internalAu= towiredAnnotationProcessor,org.springframework.context.annotation.internalR= equiredAnnotationProcessor,org.springframework.context.annotation.internalC= ommonAnnotationProcessor,org.springframework.context.annotation.Configurati= onClassPostProcessor$ImportAwareBeanPostProcessor#0]; root of factory hiera= rchy
13/05/12 01:57:52 INFO hbase.HBaseTarget: HBaseTarget ignores checks f= or existing outputs...
13/05/12 01:57:53 INFO collect.PGroupedTab= leImpl: Setting num reduce tasks to 2
13/05/12 01:57:53 ERROR mr.= MRPipeline: org.apache.crunch.CrunchRuntimeException: java.io.IOException: = java.lang.RuntimeException: java.io.IOException: No such file or directory<= /div>
13/05/12 01:57:53 WARN mr.MRPipeline: Not running cleanup while output= targets remain

Any idea of the origin of th= e problem ? Maybe it's something with permissions or a crunch tmp file,= but I can't find out where it come from

Thanks for your help


Quentin



<= font color=3D"#888888">--
Director of Data Science
Twitter: @= josh_wills



<= font color=3D"#888888">--
Quentin Ambard




--
=
Director of Data Science
Twitter: @josh_wills



<= /div>--
Quentin Ambard



--
Quentin Amba= rd
--bcaec51b206dc1bad404dcc30a8f--