Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 54131 invoked from network); 20 Jul 2009 21:19:31 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 20 Jul 2009 21:19:31 -0000 Received: (qmail 75380 invoked by uid 500); 20 Jul 2009 21:20:34 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 75273 invoked by uid 500); 20 Jul 2009 21:20:34 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 75263 invoked by uid 99); 20 Jul 2009 21:20:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Jul 2009 21:20:34 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jbalakrishnan@docomolabs-usa.com designates 216.98.102.228 as permitted sender) Received: from [216.98.102.228] (HELO fridge.docomolabs-usa.com) (216.98.102.228) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Jul 2009 21:20:22 +0000 Received: from dcl-ex.dcml.docomolabs-usa.com (viruswall.docomolabs-usa.com [172.21.96.230]) by fridge.docomolabs-usa.com (Postfix) with ESMTP id 33CA41B84B for ; Mon, 20 Jul 2009 14:20:01 -0700 (PDT) X-MIMEOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Subject: Unable to start Hadoop mapred cluster on EC2 with Hadoop 0.20.0 Date: Mon, 20 Jul 2009 14:20:00 -0700 Message-ID: <2FD61F37AFF16D4DB46149330E4273C70237B6BF@dcl-ex.dcml.docomolabs-usa.com> In-Reply-To: <4A64D939.1060509@yahoo-inc.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Unable to start Hadoop mapred cluster on EC2 with Hadoop 0.20.0 Thread-Index: AcoJfD0tJosEmGueQo6nXE5AvN4wgwAAdt5g References: <24540694.post@talk.nabble.com> <4A6418E5.5040408@yahoo-inc.com> <1248118058.3852.195.camel@arv-desktop> <45f85f70907201230y33b81ed7ib9878f82e64f3e49@mail.gmail.com> <4A64D939.1060509@yahoo-inc.com> From: "Jeyendran Balakrishnan" To: X-Virus-Checked: Checked by ClamAV on apache.org Hello, I downloaded Hadoop 0.20.0 and used the src/contrib/ec2/bin scripts to launch a Hadoop cluster on Amazon EC2. To do so, I modified the bundled scripts above for my EC2 account, and then created my own Hadoop 0.20.0 AMI. The steps I followed for creating AMIs and launching EC2 Hadoop clusters are the same I was using for over a year with Hadoop 0.18.* and 0.19.*. I launched an instance with my new Hadoop 0.20.0 AMI, then logged in and ran the following to launch a new cluster: root(/vol/hadoop-0.20.0)> bin/launch-hadoop-cluster hadoop-test 2 After the usual EC2 wait, one master and two slave instances were launched on EC2, as expected. When I ssh'ed into the instances, here is what I found: Slaves: DataNode and NameNode are running Master: Only NameNode is running I could use HDFS commands (using $HADOOP_HOME/bin/hadoop scripts) without any problems, from both master and slaves. However, since JobTracker is not running, I cannot run map-reduce jobs. I checked the logs from /vol/hadoop-0.20.0/logs for the JobTracker, reproduced below: ----------------------------------------------- <<< 2009-07-20 16:56:30,273 WARN org.apache.hadoop.conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and h dfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively 2009-07-20 16:56:30,320 INFO org.apache.hadoop.mapred.JobTracker: STARTUP_MSG:=20 /************************************************************ STARTUP_MSG: Starting JobTracker STARTUP_MSG: host =3D domU-12-31-39-04-30-16/10.240.55.228 STARTUP_MSG: args =3D [] STARTUP_MSG: version =3D 0.20.0 STARTUP_MSG: build =3D https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.20 -r 763504; compiled by 'ndaley' on Thu Apr 9 05:18:40 UTC 2009 ************************************************************/ 2009-07-20 16:56:31,332 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=3DJobTracker, port=3D50002 2009-07-20 16:56:31,603 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2009-07-20 16:56:31,900 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50030 2009-07-20 16:56:31,900 INFO org.mortbay.log: jetty-6.1.14 2009-07-20 16:56:33,461 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50030 2009-07-20 16:56:33,462 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=3DJobTracker, sessionId=3D 2009-07-20 16:56:33,531 INFO org.apache.hadoop.mapred.JobTracker: JobTracker up at: 50002 2009-07-20 16:56:33,532 INFO org.apache.hadoop.mapred.JobTracker: JobTracker webserver: 50030 2009-07-20 16:56:51,554 INFO org.apache.hadoop.mapred.JobTracker: Cleaning up the system directory 2009-07-20 16:56:53,060 INFO org.apache.hadoop.hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /mnt/hadoop/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(F SNamesystem.java:1256) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:4 22) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav a:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor Impl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) at org.apache.hadoop.ipc.Client.call(Client.java:739) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy4.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav a:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor Impl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvo cationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocation Handler.java:59) at $Proxy4.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DF SClient.java:2873) ... ... 2009-07-20 16:56:55,878 WARN org.apache.hadoop.hdfs.DFSClient: NotReplicatedYetException sleeping /mnt/hadoop/mapred/system/jobtracker.info retries left 1 2009-07-20 16:56:59,082 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /mnt/hadoop/mapred/system/jobtracker.info could only=20 replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(F SNamesystem.java:1256) ... ... 2009-07-20 16:57:00,092 FATAL org.apache.hadoop.mapred.JobTracker: java.net.BindException: Problem binding to domU-12-31-39-04-30-16.compute-1.internal/10.240.55.228:50002 : Address already in use at org.apache.hadoop.ipc.Server.bind(Server.java:190) at org.apache.hadoop.ipc.Server$Listener.(Server.java:253) at org.apache.hadoop.ipc.Server.(Server.java:1026) at org.apache.hadoop.ipc.RPC$Server.(RPC.java:488) at org.apache.hadoop.ipc.RPC.getServer(RPC.java:450) at org.apache.hadoop.mapred.JobTracker.(JobTracker.java:1537) at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:174) at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3528) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119 ) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at org.apache.hadoop.ipc.Server.bind(Server.java:188) ... 7 more 2009-07-20 16:57:00,093 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG:=20 /************************************************************ SHUTDOWN_MSG: Shutting down JobTracker at domU-12-31-39-04-30-16/10.240.55.228 ************************************************************/ >>> ----------------------------------------------- So it looks like the JobTracker launched, but then died trying to replicate the jobtracker.info file to one or more slaves. Would appreciate any help in this... Thanks a lot, jp