Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 63CE7106CE for ; Mon, 12 May 2014 06:17:23 +0000 (UTC) Received: (qmail 10798 invoked by uid 500); 12 May 2014 06:10:02 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 10662 invoked by uid 500); 12 May 2014 06:10:02 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 10604 invoked by uid 99); 12 May 2014 06:10:02 -0000 Received: from Unknown (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 May 2014 06:10:02 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of xiaotao.cs.nju@gmail.com designates 209.85.192.50 as permitted sender) Received: from [209.85.192.50] (HELO mail-qg0-f50.google.com) (209.85.192.50) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 May 2014 06:09:58 +0000 Received: by mail-qg0-f50.google.com with SMTP id z60so7200615qgd.9 for ; Sun, 11 May 2014 23:09:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=0ez8MJmTNEQWxOY+0whRKtTSzpIRG92U+nNs30XcpYA=; b=bndwH6q6NLGlUkWOVi8PDd8VW7Pv4WPuyHc2BLcjXtyU4fpkxb4iSnxnXKx+TtjvH3 JNkgGGyg3yfAUtYdenhHSjAwe54WEHzGSzw9uwz8kvBZsZDRnNMfe5F0vUQRVbANEzGZ HLqKWPrZ/SUaAEf2nlzMR/SvOwwzLQe6MSTE9eQQRjoJwtMFlZLEhRdIyCcbA1dPsSy4 axvijLX4B4lJn0iQMRSpiKYINm0LDX7x/AgTmGBGbzLXvQTf5ZmBLuGaUCBhRw4iWwEp xPTJFUotZxO5Gz7ePIA3bKE/a7kuGQpAjg9h8LNGtjSDkzl7aKpj7MlKyC5+gl1MgGbI eWmg== MIME-Version: 1.0 X-Received: by 10.140.42.12 with SMTP id b12mr1431896qga.109.1399874977440; Sun, 11 May 2014 23:09:37 -0700 (PDT) Received: by 10.96.134.136 with HTTP; Sun, 11 May 2014 23:09:37 -0700 (PDT) In-Reply-To: <78FFDD6D-9D82-4207-B07E-9112088BEA09@gmail.com> References: <78FFDD6D-9D82-4207-B07E-9112088BEA09@gmail.com> Date: Mon, 12 May 2014 14:09:37 +0800 Message-ID: Subject: Re: No job can run in YARN (Hadoop-2.2) From: Tao Xiao To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11c1172cc2acbd04f92dca77 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c1172cc2acbd04f92dca77 Content-Type: text/plain; charset=UTF-8 The *FileNotFoundException* was thrown when I tried to submit a job calculating PI, actually there is no such exception thrown when I submit a wordcount job, but I can still see "Exception from container-launch... " and any other jobs would throw such exceptions. Every job runs successfully when I commented out properties *mapreduce.map.java.opts* and *mapreduce.reduce.java.opts.* Indeed sounds odd, but I think maybe it is because that these two properties conflict with other memory-related properties, so the container can not be launched. 2014-05-12 3:37 GMT+08:00 Jay Vyas : > Sounds odd....So (1) you got a filenotfound exception and (2) you fixed it > by commenting out memory specific config parameters? > > Not sure how that would work... Any other details or am I missing > something else? > > On May 11, 2014, at 4:16 AM, Tao Xiao wrote: > > I'm sure this problem is caused by the incorrect configuration. I > commented out all the configurations regarding memory, then jobs can run > successfully. > > > 2014-05-11 0:01 GMT+08:00 Tao Xiao : > >> I installed Hadoop-2.2 in a cluster of 4 nodes, following Hadoop YARN >> Installation: The definitive guide >> . >> >> The configurations are as follows: >> >> ~/.bashrc core-site.xml >> hdfs-site.xml mapred-site.xml >> slaves >> yarn-site.xml >> >> >> I started NameNode, DataNodes, ResourceManager and NodeManagers >> successfully, but no job can run successfully. For example, I run the >> following job: >> >> [root@Single-Hadoop ~]# yarn jar >> /var/soft/apache/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar >> pi 2 4 >> >> The output is as follows: >> >> 14/05/10 23:56:25 INFO mapreduce.Job: Task Id : >> attempt_1399733823963_0004_m_000000_0, Status : FAILED >> Exception from container-launch: >> org.apache.hadoop.util.Shell$ExitCodeException: >> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) >> at org.apache.hadoop.util.Shell.run(Shell.java:379) >> at >> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) >> at >> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) >> at >> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283) >> at >> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79) >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) >> at java.lang.Thread.run(Thread.java:662) >> >> >> >> 14/05/10 23:56:25 INFO mapreduce.Job: Task Id : >> attempt_1399733823963_0004_m_000001_0, Status : FAILED >> Exception from container-launch: >> org.apache.hadoop.util.Shell$ExitCodeException: >> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) >> at org.apache.hadoop.util.Shell.run(Shell.java:379) >> at >> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) >> at >> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) >> at >> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283) >> at >> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79) >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) >> at java.lang.Thread.run(Thread.java:662) >> >> ... ... >> >> >> 14/05/10 23:56:36 INFO mapreduce.Job: map 100% reduce 100% >> 14/05/10 23:56:37 INFO mapreduce.Job: Job job_1399733823963_0004 failed >> with state FAILED due to: Task failed task_1399733823963_0004_m_000000 >> Job failed as tasks failed. failedMaps:1 failedReduces:0 >> >> 14/05/10 23:56:37 INFO mapreduce.Job: Counters: 10 >> Job Counters >> Failed map tasks=7 >> Killed map tasks=1 >> Launched map tasks=8 >> Other local map tasks=6 >> Data-local map tasks=2 >> Total time spent by all maps in occupied slots (ms)=21602 >> Total time spent by all reduces in occupied slots (ms)=0 >> Map-Reduce Framework >> CPU time spent (ms)=0 >> Physical memory (bytes) snapshot=0 >> Virtual memory (bytes) snapshot=0 >> Job Finished in 24.515 seconds >> java.io.FileNotFoundException: File does not exist: hdfs:// >> Single-Hadoop.zd.com/user/root/QuasiMonteCarlo_1399737371038_1022927375/out/reduce-out >> at >> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110) >> at >> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102) >> at >> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) >> at >> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102) >> at >> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1749) >> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1773) >> at >> org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314) >> at >> org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >> at >> org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at >> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72) >> at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) >> at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.util.RunJar.main(RunJar.java:212) >> >> >> >> Why would any job fail? Is it because the configurations are not >> correct? >> > > --001a11c1172cc2acbd04f92dca77 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
The FileNotFoundException was thrown when I = tried to submit a job calculating PI, actually there is no such exception t= hrown when I submit a wordcount job, but I can still see "Exception f= rom container-launch... " =C2=A0and any other jobs would throw such ex= ceptions.

Every job runs successfully when I commented out properties=C2= =A0=C2=A0mapreduce.map.java.opts
and mapreduce.reduce.java.opts.

Indeed sounds odd, but I think maybe it is because that t= hese two properties conflict with other memory-related properties, so the c= ontainer can not be launched.=C2=A0


2= 014-05-12 3:37 GMT+08:00 Jay Vyas <jayunit100@gmail.com>:=
Sounds odd....So (1) you got a filenotfound exceptio= n and (2) you fixed it by commenting out memory specific config parameters?=

Not sure how that would work... Any other details= or am I missing something else?

On May 11, 2014, at 4:16 AM, Tao Xiao= <xiaotao.= cs.nju@gmail.com> wrote:

I'm sure this problem is caused by the incorrect configuration. I comme= nted out all the configurations regarding memory, then jobs can run success= fully.=C2=A0


2014-05-11 0:01 GMT+08:00 Tao Xiao <xiaotao.cs.nju@gmail.com>= ;:
I installed Hadoop-2.2 in a cluster of 4 nodes, following = Hadoop YARN Installation: The d= efinitive guide.=C2=A0

The configurations are as follows:

= ~/.bashrc=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0core-site.xml=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0hdf= s-site.xml=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mapred-site.xml = =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0slaves=C2=A0 =C2=A0 =C2=A0 =C2=A0 yarn-site.xml


I started NameNode, DataNodes, ResourceM= anager and NodeManagers successfully, but no job can run successfully. For = example, I =C2=A0run the following job:

[root@Sing= le-Hadoop ~]# =C2=A0 =C2=A0yarn jar /var/soft/apache/hadoop-2.2.0/share/had= oop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 4

The output is as follows:

14/05/10 23:56:25 INFO mapreduce.Job: Task Id : attempt_13997338= 23963_0004_m_000000_0, Status : FAILED
Exception from container-l= aunch:=C2=A0
org.apache.hadoop.util.Shell$ExitCodeException:=C2=A0
at org.apache.hadoop.util.Shell.runC= ommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util= .Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.yarn.server.nodemana= ger.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:= 195)
at org.apache.hadoop.yarn= .server.nodemanager.containermanager.launcher.ContainerLaunch.call(Containe= rLaunch.java:283)
at= org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.Contai= nerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.F= utureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.= java:138)
at java.util.concurrent.T= hreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoo= lExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(T= hread.java:662)



14/0= 5/10 23:56:25 INFO mapreduce.Job: Task Id : attempt_1399733823963_0004_m_00= 0001_0, Status : FAILED
Exception from container-launch:=C2=A0
org.apache.hadoop.uti= l.Shell$ExitCodeException:=C2=A0
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shel= l.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:5= 89)
at org.apache.hadoop.yarn= .server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContain= erExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.Con= tainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn= .server.nodemanager.containermanager.launcher.ContainerLaunch.call(Containe= rLaunch.java:79)
at = java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.F= utureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Th= readPoolExecutor.java:895)
at java.util.concurrent.T= hreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:6= 62)

... ...


<= div>14/05/10 23:56:36 INFO mapreduce.Job: =C2=A0map 100% reduce 100%
<= div>14/05/10 23:56:37 INFO mapreduce.Job: Job job_1399733823963_0004 failed= with state FAILED due to: Task failed task_1399733823963_0004_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
14/05/10 23:56:37 INFO mapreduce.Job: Counters: 10
Job Counters=C2=A0
Failed map tasks=3D7
Killed map tasks=3D1
Launched map tasks=3D8=
Other local map ta= sks=3D6
Data-local map tasks=3D2=
Total time spent b= y all maps in occupied slots (ms)=3D21602
Total time spent by all reduces in occupied slots (m= s)=3D0
Map-Reduce Framework
CPU time spent (ms)=3D0=
Physical memory (b= ytes) snapshot=3D0
Virtual memory (bytes) s= napshot=3D0
Job Finished in 24.515 seconds
java.io.File= NotFoundException: File does not exist: hdfs://Single-Hadoop.zd.com/user/root/QuasiMonteCarlo_1399737= 371038_1022927375/out/reduce-out
at org.apache.hadoop.hdfs= .DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
at org.apache.hadoop.hdfs.Dis= tributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.F= ileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.Distrib= utedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
at org.apache.hadoop.io.S= equenceFile$Reader.<init>(SequenceFile.java:1749)
at org.apache.hadoop.io.SequenceFile$Re= ader.<init>(SequenceFile.java:1773)
at org.apache.hadoop.exam= ples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)
at org.apache.hadoop.examples.QuasiMo= nteCarlo.run(QuasiMonteCarlo.java:354)
at org.apache.hadoop.util= .ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMon= teCarlo.java:363)
at sun.reflect.NativeMeth= odAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMeth= odAccessorImpl.java:39)
at sun.reflect.Delegating= MethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
<= span style=3D"white-space:pre-wrap"> at java.lang.reflect.Method.inv= oke(Method.java:597)
at org.apache.hadoop.util= .ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
<= span style=3D"white-space:pre-wrap"> at org.apache.hadoop.util.Progr= amDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.exam= ples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(= Native Method)
at sun.reflect.NativeMeth= odAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessor= Impl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Meth= od.invoke(Method.java:597)
= at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



Why would any job f= ail? =C2=A0Is it because the configurations are not correct?=C2=A0


--001a11c1172cc2acbd04f92dca77--