Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 61B3A10FC6 for ; Wed, 11 Dec 2013 09:33:41 +0000 (UTC) Received: (qmail 86895 invoked by uid 500); 11 Dec 2013 09:33:06 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 86732 invoked by uid 500); 11 Dec 2013 09:33:00 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 86113 invoked by uid 99); 11 Dec 2013 09:32:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Dec 2013 09:32:58 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of silvi.caino@gmail.com designates 209.85.128.53 as permitted sender) Received: from [209.85.128.53] (HELO mail-qe0-f53.google.com) (209.85.128.53) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Dec 2013 09:32:54 +0000 Received: by mail-qe0-f53.google.com with SMTP id nc12so4946917qeb.40 for ; Wed, 11 Dec 2013 01:32:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=UcN5vQSPpLc0rslNeKhvSJeotciwCFX2XSZ/BgkvReI=; b=Rk6B1o6dzEOL7oRBBTpb5GoPhclBePlYDDcia2naH/lykujJKZ95apidK59Rnv9jxL prprVTWjHtKeehUGB6t1KnhtXXe7AECdVee9dicobHxrtPxjth5xWfZAJCkIvTHbSaWq 6PLKYDZSZRMeH22FJ6ffc/zsNY5TTI8qo7sQY5FYQYRiCNMAL28y4zn1U1SoW3iT1V3c Y4WuODMHyP88sn2N5u7bLD6EVvwoFJ5CoqXRhjGCivZ3EwHAKLrakRgNUpETWs1nKuxZ av+WpbbGulFo4AmyPmuFwjw6vbYO4Dqbc7p3UuCULMIaXDATI/9DlaL+l6MO7pytFv2I 0PlQ== MIME-Version: 1.0 X-Received: by 10.224.24.131 with SMTP id v3mr804946qab.48.1386754352965; Wed, 11 Dec 2013 01:32:32 -0800 (PST) Received: by 10.229.169.134 with HTTP; Wed, 11 Dec 2013 01:32:32 -0800 (PST) In-Reply-To: References: Date: Wed, 11 Dec 2013 10:32:32 +0100 Message-ID: Subject: Re: Job stuck in running state on Hadoop 2.2.0 From: =?ISO-8859-1?Q?Silvina_Ca=EDno_Lores?= To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11c251ea996a8104ed3ee887 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c251ea996a8104ed3ee887 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable OK that was indeed a classpath issue, which I solved by directly exporting the output of hadoop classpath (ie. the list of neeed jars, see this) into HADOOP_CLASSPATH in hadoop-env.sh and yarn-env.sh With this fixed, the stuck issue came back so I will study Adam's suggestio= n On 11 December 2013 10:01, Silvina Ca=EDno Lores wro= te: > Actually now it seems to be running (or at least attempting to run) but I > get further errors: > > hadoop jar > ~/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoo= p/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar > pi 1 100 > > INFO mapreduce.Job: Job job_1386751964857_0001 failed with state FAILED > due to: Application application_1386751964857_0001 failed 2 times due to = AM > Container for appattempt_1386751964857_0001_000002 exited with exitCode: = 1 > due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:504) > at org.apache.hadoop.util.Shell.run(Shell.java:417) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:636) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launch= Container(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.Conta= inerLaunch.call(ContainerLaunch.java:283) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.Conta= inerLaunch.call(ContainerLaunch.java:79) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java= :1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav= a:615) > at java.lang.Thread.run(Thread.java:724) > > > > I guess it seems some sort of classpath issue because of this log: > > /scratch/HDFS-scaino-2/logs/application_1386751964857_0001/container_1386= 751964857_0001_01_000001$ > cat stderr > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/service/CompositeService > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:792) > at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142= ) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.service.CompositeService > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 13 more > > > I haven't found a solution yet despite the classpath looks nice: > > hadoop classpath > > > /home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/= etc/hadoop:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-= SNAPSHOT/share/hadoop/common/lib/*:/home/scaino/hadoop-2.2.0-maven/hadoop-d= ist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/common/*:/home/scaino/hadoop-= 2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/hdfs:/hom= e/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/= hadoop/hdfs/lib/*:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop= -3.0.0-SNAPSHOT/share/hadoop/hdfs/*:/home/scaino/hadoop-2.2.0-maven/hadoop-= dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/yarn/lib/*:/home/scaino/hado= op-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/yarn/*= :/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/s= hare/hadoop/mapreduce/lib/*:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/tar= get/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/*:/contrib/capacity-schedu= ler/*.jar > > > Could that be related to the previous launch errors?? > > Thanks in advance :) > > > > > On 11 December 2013 00:29, Adam Kawa wrote: > >> It sounds like the job was successfully submitted to the cluster, but >> there as some problem when starting/running AM, so that no progress is >> made. It happened to me once, when I was playing with YARN on a cluster >> consisting of very small machines, and I mis-configured YARN to allocate= d >> to AM more memory than the actual memory available on any machine on my >> cluster. So that RM was not able to start AM anywhere due to inability t= o >> find big enough container. >> >> Could you show the logs from the job? The link should be available on >> your console after you submit a job e.g. >> 13/12/10 10:41:21 INFO mapreduce.Job: The url to track the job: >> http://compute-7-2:8088/proxy/application_1386668372725_0001/ >> >> >> 2013/12/10 Silvina Ca=EDno Lores >> >>> Thank you! I realized that, despite I exported the variables in the >>> scripts, there were a few errors and my desired configuration wasn't be= ing >>> used (which explained other strange behavior). >>> >>> However, I'm still getting the same issue with the examples, for >>> instance: >>> >>> hadoop jar >>> ~/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/had= oop/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar >>> pi 1 100 >>> Number of Maps =3D 1 >>> Samples per Map =3D 100 >>> 13/12/10 10:41:18 WARN util.NativeCodeLoader: Unable to load >>> native-hadoop library for your platform... using builtin-java classes w= here >>> applicable >>> Wrote input for Map #0 >>> Starting Job >>> 13/12/10 10:41:19 INFO client.RMProxy: Connecting to ResourceManager at= / >>> 0.0.0.0:8032 >>> 13/12/10 10:41:20 INFO input.FileInputFormat: Total input paths to >>> process : 1 >>> 13/12/10 10:41:20 INFO mapreduce.JobSubmitter: number of splits:1 >>> 13/12/10 10:41:20 INFO Configuration.deprecation: user.name is >>> deprecated. Instead, use mapreduce.job.user.name >>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.jar is >>> deprecated. Instead, use mapreduce.job.jar >>> 13/12/10 10:41:20 INFO Configuration.deprecation: >>> mapred.map.tasks.speculative.execution is deprecated. Instead, use >>> mapreduce.map.speculative >>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.reduce.tasks i= s >>> deprecated. Instead, use mapreduce.job.reduces >>> 13/12/10 10:41:20 INFO Configuration.deprecation: >>> mapred.output.value.class is deprecated. Instead, use >>> mapreduce.job.output.value.class >>> 13/12/10 10:41:20 INFO Configuration.deprecation: >>> mapred.reduce.tasks.speculative.execution is deprecated. Instead, use >>> mapreduce.reduce.speculative >>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapreduce.map.class i= s >>> deprecated. Instead, use mapreduce.job.map.class >>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.job.name is >>> deprecated. Instead, use mapreduce.job.name >>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapreduce.reduce.clas= s >>> is deprecated. Instead, use mapreduce.job.reduce.class >>> 13/12/10 10:41:20 INFO Configuration.deprecation: >>> mapreduce.inputformat.class is deprecated. Instead, use >>> mapreduce.job.inputformat.class >>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.input.dir is >>> deprecated. Instead, use mapreduce.input.fileinputformat.inputdir >>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.output.dir is >>> deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir >>> 13/12/10 10:41:20 INFO Configuration.deprecation: >>> mapreduce.outputformat.class is deprecated. Instead, use >>> mapreduce.job.outputformat.class >>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.map.tasks is >>> deprecated. Instead, use mapreduce.job.maps >>> 13/12/10 10:41:20 INFO Configuration.deprecation: >>> mapred.output.key.class is deprecated. Instead, use >>> mapreduce.job.output.key.class >>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.working.dir is >>> deprecated. Instead, use mapreduce.job.working.dir >>> 13/12/10 10:41:20 INFO mapreduce.JobSubmitter: Submitting tokens for >>> job: job_1386668372725_0001 >>> 13/12/10 10:41:20 INFO impl.YarnClientImpl: Submitted application >>> application_1386668372725_0001 to ResourceManager at /0.0.0.0:8032 >>> 13/12/10 10:41:21 INFO mapreduce.Job: The url to track the job: >>> http://compute-7-2:8088/proxy/application_1386668372725_0001/ >>> 13/12/10 10:41:21 INFO mapreduce.Job: Running job: job_1386668372725_00= 01 >>> 13/12/10 10:41:31 INFO mapreduce.Job: Job job_1386668372725_0001 runnin= g >>> in uber mode : false >>> 13/12/10 10:41:31 INFO mapreduce.Job: map 0% reduce 0% >>> ---- stuck here ---- >>> >>> >>> I hope the problem is not in the environment files. I have the followin= g >>> at the beginning of hadoop-env.sh: >>> >>> # The java implementation to use. >>> export JAVA_HOME=3D/home/software/jdk1.7.0_25/ >>> >>> # The jsvc implementation to use. Jsvc is required to run secure >>> datanodes. >>> #export JSVC_HOME=3D${JSVC_HOME} >>> >>> export >>> HADOOP_INSTALL=3D/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/had= oop-3.0.0-SNAPSHOT >>> >>> export HADOOP_HDFS_HOME=3D$HADOOP_INSTALL >>> export HADOOP_COMMON_HOME=3D$HADOOP_INSTALL >>> export HADOOP_CONF_DIR=3D$HADOOP_INSTALL"/etc/hadoop" >>> >>> >>> and this in yarn-env.sh: >>> >>> export JAVA_HOME=3D/home/software/jdk1.7.0_25/ >>> >>> export >>> HADOOP_INSTALL=3D/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/had= oop-3.0.0-SNAPSHOT >>> >>> export HADOOP_HDFS_HOME=3D$HADOOP_INSTALL >>> export HADOOP_COMMON_HOME=3D$HADOOP_INSTALL >>> export HADOOP_CONF_DIR=3D$HADOOP_INSTALL"/etc/hadoop" >>> >>> >>> Not sure what to do about HADOOP_YARN_USER though, since I don't have a >>> dedicated user to run the demons. >>> >>> Thanks! >>> >>> >>> On 10 December 2013 10:10, Taka Shinagawa wrote= : >>> >>>> I had a similar problem after setting up Hadoop 2.2.0 based on the >>>> instructions at >>>> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-commo= n/SingleCluster.html >>>> >>>> Although it's not documented on the page, I needed to >>>> edit hadoop-env.sh and yarn-env.sh as well to update >>>> JAVA_HOME, HADOOP_CONF_DIR, HADOOP_YARN_USER and YARN_CONF_DIR. >>>> >>>> Once these variables are set, I was able to run the example >>>> successfully. >>>> >>>> >>>> >>>> On Mon, Dec 9, 2013 at 11:37 PM, Silvina Ca=EDno Lores < >>>> silvi.caino@gmail.com> wrote: >>>> >>>>> >>>>> Hi everyone, >>>>> >>>>> I'm having trouble running the Hadoop examples in a single node. All >>>>> the executions get stuck at the running state at 0% map and reduce an= d the >>>>> logs don't seem to indicate any issue, besides the need to kill the n= ode >>>>> manager: >>>>> >>>>> compute-0-7-3: nodemanager did not stop gracefully after 5 seconds: >>>>> killing with kill -9 >>>>> >>>>> RM >>>>> >>>>> 2013-12-09 11:52:22,466 INFO >>>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: >>>>> Command to launch container container_1386585879247_0001_01_000001 : >>>>> $JAVA_HOME/bin/java -Dlog4j.configuration=3Dcontainer-log4j.propertie= s >>>>> -Dyarn.app.container.log.dir=3D -Dyarn.app.container.log.fil= esize=3D0 >>>>> -Dhadoop.root.logger=3DINFO,CLA -Xmx1024m >>>>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/stdout >>>>> 2>/stderr >>>>> 2013-12-09 11:52:22,882 INFO >>>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: = Done >>>>> launching container Container: [ContainerId: >>>>> container_1386585879247_0001_01_000001, NodeId: compute-0-7-3:8010, >>>>> NodeHttpAddress: compute-0-7-3:8042, Resource: , >>>>> Priority: 0, Token: Token { kind: ContainerToken, service: >>>>> 10.0.7.3:8010 }, ] for AM appattempt_1386585879247_0001_000001 >>>>> 2013-12-09 11:52:22,883 INFO >>>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAtte= mptImpl: >>>>> appattempt_1386585879247_0001_000001 State change from ALLOCATED to L= AUNCHED >>>>> 2013-12-09 11:52:23,371 INFO >>>>> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer= Impl: >>>>> container_1386585879247_0001_01_000001 Container Transitioned from AC= QUIRED >>>>> to RUNNING >>>>> 2013-12-09 11:52:30,922 INFO >>>>> SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for >>>>> appattempt_1386585879247_0001_000001 (auth:SIMPLE) >>>>> 2013-12-09 11:52:30,938 INFO >>>>> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterServic= e: AM >>>>> registration appattempt_1386585879247_0001_000001 >>>>> 2013-12-09 11:52:30,939 INFO >>>>> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=3Ds= caino >>>>> IP=3D10.0.7.3 OPERATION=3DRegister App Master TARGET=3DApplicationMas= terService >>>>> RESULT=3DSUCCESS APPID=3Dapplication_1386585879247_0001 >>>>> APPATTEMPTID=3Dappattempt_1386585879247_0001_000001 >>>>> 2013-12-09 11:52:30,941 INFO >>>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAtte= mptImpl: >>>>> appattempt_1386585879247_0001_000001 State change from LAUNCHED to RU= NNING >>>>> 2013-12-09 11:52:30,941 INFO >>>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: >>>>> application_1386585879247_0001 State change from ACCEPTED to RUNNING >>>>> >>>>> >>>>> NM >>>>> >>>>> 2013-12-10 08:26:02,100 INFO >>>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxService= s: Got >>>>> event CONTAINER_STOP for appId application_1386585879247_0001 >>>>> 2013-12-10 08:26:02,102 INFO >>>>> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: >>>>> Deleting absolute path : >>>>> /scratch/HDFS-scaino-2/tmp/nm-local-dir/usercache/scaino/appcache/app= lication_1386585879247_0001 >>>>> 2013-12-10 08:26:02,103 INFO >>>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxService= s: Got >>>>> event APPLICATION_STOP for appId application_1386585879247_0001 >>>>> 2013-12-10 08:26:02,110 INFO >>>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.applicatio= n.Application: >>>>> Application application_1386585879247_0001 transitioned from >>>>> APPLICATION_RESOURCES_CLEANINGUP to FINISHED >>>>> 2013-12-10 08:26:02,157 INFO >>>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler= .NonAggregatingLogHandler: >>>>> Scheduling Log Deletion for application: application_1386585879247_00= 01, >>>>> with delay of 10800 seconds >>>>> 2013-12-10 08:26:04,688 INFO >>>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.Co= ntainersMonitorImpl: >>>>> Stopping resource-monitoring for container_1386585879247_0001_01_0000= 01 >>>>> 2013-12-10 08:26:05,838 INFO >>>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerM= anagerImpl: >>>>> Done waiting for Applications to be Finished. Still alive: >>>>> [application_1386585879247_0001] >>>>> 2013-12-10 08:26:05,839 INFO org.apache.hadoop.ipc.Server: Stopping >>>>> server on 8010 >>>>> 2013-12-10 08:26:05,846 INFO org.apache.hadoop.ipc.Server: Stopping >>>>> IPC Server listener on 8010 >>>>> 2013-12-10 08:26:05,847 INFO org.apache.hadoop.ipc.Server: Stopping >>>>> IPC Server Responder >>>>> >>>>> I tried the pi and wordcount examples with same results, any ideas on >>>>> how to debug this? >>>>> >>>>> Thanks in advance. >>>>> >>>>> Regards, >>>>> Silvina Ca=EDno >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >> > --001a11c251ea996a8104ed3ee887 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
OK that was indeed a classpath issue, which I solved= by directly exporting the output of hadoop classpath (ie. the list of neee= d jars, see t= his) into HADOOP_CLASSPATH in hadoop-env.sh and yarn-env.sh

With this fixed, the stuck issue came back so I will = study Adam's suggestion


On 11 Decembe= r 2013 10:01, Silvina Ca=EDno Lores <silvi.caino@gmail.com> wrote:
Actually now it seems to be running (or at le= ast attempting to run) but I get further errors:

hadoop j= ar ~/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hado= op/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar pi 1 100

INFO mapreduce.Job: = Job job_1386751964857_0001 failed with state FAILED due to: Application app= lication_1386751964857_0001 failed 2 times due to AM Container for appattem= pt_1386751964857_0001_000002 exited with exitCode: 1 due to: Exception fro= m container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.u= til.Shell.runCommand(Shell.java:504)
at org.apache.hadoop.util.Shell.ru= n(Shell.java:417)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.= execute(Shell.java:636)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.laun= chContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yar= n.server.nodemanager.containermanager.launcher.ContainerLaunch.call(Contain= erLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.Con= tainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.Futu= reTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.Futur= eTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.ja= va:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPo= olExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)



I guess = it seems some sort of classpath issue because of this log:

/scratch= /HDFS-scaino-2/logs/application_1386751964857_0001/container_1386751964857_= 0001_01_000001$ cat stderr
Exception in thread "main" java.lan= g.NoClassDefFoundError: org/apache/hadoop/service/CompositeService
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.Clas= sLoader.defineClass(ClassLoader.java:792)
at java.security.SecureClassL= oader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoade= r.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.= net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassL= oader$1.run(URLClassLoader.java:355)
at java.security.AccessController.= doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.= lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$A= ppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loa= dClass(ClassLoader.java:357)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.service.Co= mpositeService
at java.net.URLClassLoader$1.run(URLClassLoader.java:366= )
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.secu= rity.AccessController.doPrivileged(Native Method)
at java.net.URLClassL= oader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadC= lass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at ja= va.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 13 more


I haven't fo= und a solution yet despite the classpath looks nice:

hadoop classpath

/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHO= T/etc/hadoop:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.= 0-SNAPSHOT/share/hadoop/common/lib/*:/home/scaino/hadoop-2.2.0-maven/hadoop= -dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/common/*:/home/scaino/hadoo= p-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/hdfs:/h= ome/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/shar= e/hadoop/hdfs/lib/*:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hado= op-3.0.0-SNAPSHOT/share/hadoop/hdfs/*:/home/scaino/hadoop-2.2.0-maven/hadoo= p-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/yarn/lib/*:/home/scaino/ha= doop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/yarn= /*:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT= /share/hadoop/mapreduce/lib/*:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/t= arget/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/*:/contrib/capacity-sche= duler/*.jar


Could that be related to the previous launch errors??

Thanks in advance :)

=


On 11 December 2013 00:29, Adam Kawa <kawa.adam@gmail.com> wrote:
It sounds like the job was successfully submitted to the c= luster, but there as some problem when starting/running AM, so that no prog= ress is made. It happened to me once, when I was playing with YARN on a clu= ster consisting of very small machines, and I mis-configured YARN to alloca= ted to AM more memory than the actual memory available on any machine on my= cluster. So that RM was not able to start AM anywhere due to inability to = find big enough container.

Could you show the logs from the job? The link should be available= on your console after you submit a job e.g.
13/12/10 10:41:21 INFO mapreduce.Job: The url to= track the job:=A0http://compute-7-2:8088/proxy/application_1386668372725_0001/


2013/12/10 Silvina Ca=EDno Lores <= ;silvi.caino@gma= il.com>
Thank you! I realized that, despite I exporte= d the variables in the scripts, there were a few errors and my desired conf= iguration wasn't being used (which explained other strange behavior).= =A0

However, I'm sti= ll getting the same issue with the examples, for instance:

hadoop jar ~/hadoop-2.2.0-= maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoo= p-mapreduce-examples-3.0.0-SNAPSHOT.jar pi 1 100
Number of Maps =3D 1
Samples per Map =3D 100
13/12/10 10:41:18 WARN = util.NativeCodeLoader: Unable to load native-hadoop library for your platfo= rm... using builtin-java classes where applicable
Wrote input for Map #0=
Starting Job
13/12/10 10:41:19 INFO client.RMProxy: Connecting to Resour= ceManager at /0.0.0.0:803= 2
13/12/10 10:41:20 INFO input.FileInputFormat: Total input paths to= process : 1
13/12/10 10:41:20 INFO mapreduce.JobSubmitter: number of splits:1
13/12/= 10 10:41:20 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
13/12/10 10:41:20 INFO Configuration.deprecation: mapred.jar is deprecated.= Instead, use mapreduce.job.jar
13/12/10 10:41:20 INFO Configuration.dep= recation: mapred.map.tasks.speculative.execution is deprecated. Instead, us= e mapreduce.map.speculative
13/12/10 10:41:20 INFO Configuration.deprecation: mapred.reduce.tasks is de= precated. Instead, use mapreduce.job.reduces
13/12/10 10:41:20 INFO Conf= iguration.deprecation: mapred.output.value.class is deprecated. Instead, us= e mapreduce.job.output.value.class
13/12/10 10:41:20 INFO Configuration.deprecation: mapred.reduce.tasks.specu= lative.execution is deprecated. Instead, use mapreduce.reduce.speculative13/12/10 10:41:20 INFO Configuration.deprecation: mapreduce.map.class is = deprecated. Instead, use mapreduce.job.map.class
13/12/10 10:41:20 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use= mapreduce.job.name=
13/12/10 10:41:20 INFO Configuration.deprecation: mapreduce.reduce.class is= deprecated. Instead, use mapreduce.job.reduce.class
13/12/10 10:41:20 INFO Configuration.deprecation: mapreduce.inputformat.cla= ss is deprecated. Instead, use mapreduce.job.inputformat.class
13/12/10 = 10:41:20 INFO Configuration.deprecation: mapred.input.dir is deprecated. In= stead, use mapreduce.input.fileinputformat.inputdir
13/12/10 10:41:20 INFO Configuration.deprecation: mapred.output.dir is depr= ecated. Instead, use mapreduce.output.fileoutputformat.outputdir
13/12/1= 0 10:41:20 INFO Configuration.deprecation: mapreduce.outputformat.class is = deprecated. Instead, use mapreduce.job.outputformat.class
13/12/10 10:41:20 INFO Configuration.deprecation: mapred.map.tasks is depre= cated. Instead, use mapreduce.job.maps
13/12/10 10:41:20 INFO Configurat= ion.deprecation: mapred.output.key.class is deprecated. Instead, use mapred= uce.job.output.key.class
13/12/10 10:41:20 INFO Configuration.deprecation: mapred.working.dir is dep= recated. Instead, use mapreduce.job.working.dir
13/12/10 10:41:20 INFO m= apreduce.JobSubmitter: Submitting tokens for job: job_1386668372725_0001 13/12/10 10:41:20 INFO impl.YarnClientImpl: Submitted application applicati= on_1386668372725_0001 to ResourceManager at /0.0.0.0:8032
13/12/10 10:41:21 INFO mapreduce.Jo= b: The url to track the job: http://compute-7-2:8088/proxy= /application_1386668372725_0001/
13/12/10 10:41:21 INFO mapreduce.Job: Running job: job_1386668372725_000113/12/10 10:41:31 INFO mapreduce.Job: Job job_1386668372725_0001 running = in uber mode : false
13/12/10 10:41:31 INFO mapreduce.Job: map 0% reduc= e 0%
---- stuck here --= --

=

I hope t= he problem is not in the environment files. I have the following at the beg= inning of hadoop-env.sh:

# The ja= va implementation to use.
export JAVA_HOME=3D/home/software/jdk1.7.0_25/=

# The jsvc implementation to use. Jsvc is required to run secure da= tanodes.
#export JSVC_HOME=3D${JSVC_HOME}

export HADOOP_INSTALL=3D/home/scain= o/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT

export= HADOOP_HDFS_HOME=3D$HADOOP_INSTALL
export HADOOP_COMMON_HOME=3D$HADOOP_= INSTALL
export HADOOP_CONF_DIR=3D$HADOOP_INSTALL"/etc/hadoop"


and this= in yarn-env.sh:

export JAVA_HOME=3D/home/software/jdk1.7.0_25/

export HADOOP_INSTALL= =3D/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT=

export HADOOP_HDFS_HOME=3D$HADOOP_INSTALL
export HADOOP_COMMON_H= OME=3D$HADOOP_INSTALL
export HADOOP_CONF_DIR=3D$HADOOP_INSTALL"/etc/hadoop"


Not sure= what to do about HADOOP_YARN_USER though, since I don't have a dedicat= ed user to run the demons.=A0

Thanks!<= /div>


On 10 December 2013 10:10, Taka Shinagawa <taka.epsilon@gmail= .com> wrote:
I had a similar problem aft= er setting up Hadoop 2.2.0 based on the instructions at http://hadoop.apache.org/docs/current/hadoop-pro= ject-dist/hadoop-common/SingleCluster.html

Although it's not documented on the page, I needed to ed= it=A0hadoop-env.sh and=A0yarn-env.sh as well to update JAVA_HOME,=A0HADOOP_= CONF_DIR,=A0HADOOP_YARN_USER and=A0YARN_CONF_DIR.

Once these variables are set, I was able to run the example successful= ly.



On Mon, Dec 9, 2013 at 11:37 PM, Silvina Ca=EDno Lores <silvi.caino@gmail.com> wrote:

Hi everyone,

I'm having trouble running the Hadoop examples in a single node. All th= e executions get stuck at the running state at 0% map and reduce and the lo= gs don't seem to indicate any issue, besides the need to kill the node = manager:

compute-0-7-3: nodemanager= did not stop gracefully after 5 seconds: killing with kill -9

RM

2013-12-= 09 11:52:22,466 INFO org.apache.hadoop.yarn.server.resourcemanager.amlaunch= er.AMLauncher: Command to launch container container_1386585879247_0001_01_= 000001 : $JAVA_HOME/bin/java -Dlog4j.configuration=3Dcontainer-log4j.proper= ties -Dyarn.app.container.log.dir=3D<LOG_DIR> -Dyarn.app.container.lo= g.filesize=3D0 -Dhadoop.root.logger=3DINFO,CLA -Xmx1024m org.apache.hadoop= .mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout 2><LOG_DIR&= gt;/stderr
2013-12-09 11:52:22,882 INFO org.apache.hadoop.yarn.server.resourcemanager.= amlauncher.AMLauncher: Done launching container Container: [ContainerId: co= ntainer_1386585879247_0001_01_000001, NodeId: compute-0-7-3:8010, NodeHttpA= ddress: compute-0-7-3:8042, Resource: <memory:2000, vCores:1>, Priori= ty: 0, Token: Token { kind: ContainerToken, service: 10.0.7.3:8010 }, ] for AM appattempt_13865= 85879247_0001_000001
2013-12-09 11:52:22,883 INFO org.apache.hadoop.yarn.server.resourcemanager.= rmapp.attempt.RMAppAttemptImpl: appattempt_1386585879247_0001_000001 State = change from ALLOCATED to LAUNCHED
2013-12-09 11:52:23,371 INFO org.apach= e.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container= _1386585879247_0001_01_000001 Container Transitioned from ACQUIRED to RUNNI= NG
2013-12-09 11:52:30,922 INFO SecurityLogger.org.apache.hadoop.ipc.Server: A= uth successful for appattempt_1386585879247_0001_000001 (auth:SIMPLE)
20= 13-12-09 11:52:30,938 INFO org.apache.hadoop.yarn.server.resourcemanager.Ap= plicationMasterService: AM registration appattempt_1386585879247_0001_00000= 1
2013-12-09 11:52:30,939 INFO org.apache.hadoop.yarn.server.resourcemanager.= RMAuditLogger: USER=3Dscaino IP=3D10.0.7.3 OPERATION=3DRegister App Master = TARGET=3DApplicationMasterService RESULT=3DSUCCESS APPID=3Dapplication_1386= 585879247_0001 APPATTEMPTID=3Dappattempt_1386585879247_0001_000001
2013-12-09 11:52:30,941 INFO org.apache.hadoop.yarn.server.resourcemanager.= rmapp.attempt.RMAppAttemptImpl: appattempt_1386585879247_0001_000001 State = change from LAUNCHED to RUNNING
2013-12-09 11:52:30,941 INFO org.apache.= hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_13865858792= 47_0001 State change from ACCEPTED to RUNNING


NM

=
2013-12-10 08:26:0= 2,100 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxSe= rvices: Got event CONTAINER_STOP for appId application_1386585879247_0001 2013-12-10 08:26:02,102 INFO org.apache.hadoop.yarn.server.nodemanager.Defa= ultContainerExecutor: Deleting absolute path : /scratch/HDFS-scaino-2/tmp/n= m-local-dir/usercache/scaino/appcache/application_1386585879247_0001
2013-12-10 08:26:02,103 INFO org.apache.hadoop.yarn.server.nodemanager.cont= ainermanager.AuxServices: Got event APPLICATION_STOP for appId application_= 1386585879247_0001
2013-12-10 08:26:02,110 INFO org.apache.hadoop.yarn.s= erver.nodemanager.containermanager.application.Application: Application app= lication_1386585879247_0001 transitioned from APPLICATION_RESOURCES_CLEANIN= GUP to FINISHED
2013-12-10 08:26:02,157 INFO org.apache.hadoop.yarn.server.nodemanager.cont= ainermanager.loghandler.NonAggregatingLogHandler: Scheduling Log Deletion f= or application: application_1386585879247_0001, with delay of 10800 seconds=
2013-12-10 08:26:04,688 INFO org.apache.hadoop.yarn.server.nodemanager.cont= ainermanager.monitor.ContainersMonitorImpl: Stopping resource-monitoring fo= r container_1386585879247_0001_01_000001
2013-12-10 08:26:05,838 INFO or= g.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerIm= pl: Done waiting for Applications to be Finished. Still alive: [application= _1386585879247_0001]
2013-12-10 08:26:05,839 INFO org.apache.hadoop.ipc.Server: Stopping server = on 8010
2013-12-10 08:26:05,846 INFO org.apache.hadoop.ipc.Server: Stopp= ing IPC Server listener on 8010
2013-12-10 08:26:05,847 INFO org.apache.= hadoop.ipc.Server: Stopping IPC Server Responder

I tried the pi and w= ordcount examples with same results, any ideas on how to debug this?

Thanks in advance.

Regards,= =A0
Silvina Ca=EDno

=A0

=A0





--001a11c251ea996a8104ed3ee887--