Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 94249 invoked from network); 31 Jul 2006 15:26:20 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 31 Jul 2006 15:26:20 -0000 Received: (qmail 47018 invoked by uid 500); 31 Jul 2006 15:26:13 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 46959 invoked by uid 500); 31 Jul 2006 15:26:12 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 46917 invoked by uid 99); 31 Jul 2006 15:26:12 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Jul 2006 08:26:12 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [209.225.28.172] (HELO mxsf40.cluster1.charter.net) (209.225.28.172) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Jul 2006 08:26:10 -0700 Received: from mxip09a.cluster1.charter.net (mxip09a.cluster1.charter.net [209.225.28.139]) by mxsf40.cluster1.charter.net (8.12.11/8.12.11) with ESMTP id k6VFPlqq003217 for ; Mon, 31 Jul 2006 11:25:48 -0400 Received: from 71-10-49-243.dhcp.bycy.mi.charter.com (HELO [10.0.1.4]) ([71.10.49.243]) by mxip09a.cluster1.charter.net with ESMTP; 31 Jul 2006 11:25:47 -0400 X-IronPort-AV: i="4.07,199,1151899200"; d="scan'208,217"; a="485529511:sNHT56818256" Mime-Version: 1.0 (Apple Message framework v752.2) To: hadoop-user@lucene.apache.org Message-Id: <6AC5DA3B-D3DD-4F1F-8B5B-58C0ABA20A9C@summazoo.com> Content-Type: multipart/alternative; boundary=Apple-Mail-1--41313449 From: Ty at SummaZoo Subject: DFS Question Date: Mon, 31 Jul 2006 11:25:45 -0400 X-Mailer: Apple Mail (2.752.2) X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N --Apple-Mail-1--41313449 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Hello, I'm evaluating Hadoop for a large GIS application. When running the wordcount example, I experience an issue where my master node cannot open a socket to port 50010 of my remove slave node. When I run the example with only my master in the slaves file, it works fine. When I add a second machine, i get the error. Here is my config: Running .0.3.2 of Hadoop OS X 10.4.7 Server for master (Elric.local - 10.0.1.4) OS X 10.4.7 for remote slave (Corum.local - 10.0.1.8) Using the standard hadoop-default.xml. Here's my hadoop-site.xml (which is the same on both machines): fs.default.name Elric.local:9000 The name of the default file system. Either the literal string "local" or a host:port for NDFS. mapred.job.tracker Elric.local:9001 The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. mapred.map.tasks 12 define mapred.map tasks to be number of slave hosts mapred.reduce.tasks 12 define mapred.reduce tasks to be number of slave hosts dfs.name.dir /nutch/filesystem/name dfs.data.dir /nutch/filesystem/data mapred.system.dir /nutch/filesystem/mapreduce/system mapred.local.dir /nutch/filesystem/mapreduce/local dfs.replication 2 dfs.datanode.port 50010 The port number that the dfs datanode server uses as a starting point to look for a free port to listen on. dfs.namenode.logging.level debug The logging level for dfs namenode. Other values are "dir"(trac e namespace mutations), "block"(trace block under/over replications and block creations/deletions), or "all". Here is the terminal output on the server: Elric:/nutch/hadoop nutch$ ./start-all.sh -su: ./start-all.sh: No such file or directory Elric:/nutch/hadoop nutch$ ./bin/start-all.sh rsync from Elric.local:/nutch/hadoop starting namenode, logging to /nutch/hadoop/logs/hadoop-nutch- namenode-Elric.local.out Elric.local: rsync from Elric.local:/nutch/hadoop 10.0.1.8: rsync from Elric.local:/nutch/hadoop Elric.local: starting datanode, logging to /nutch/hadoop/logs/hadoop- nutch-datanode-Elric.local.out 10.0.1.8: starting datanode, logging to /nutch/hadoop/logs/hadoop- nutch-datanode-Corum.local.out rsync from Elric.local:/nutch/hadoop starting jobtracker, logging to /nutch/hadoop/logs/hadoop-nutch- jobtracker-Elric.local.out Elric.local: rsync from Elric.local:/nutch/hadoop 10.0.1.8: rsync from Elric.local:/nutch/hadoop Elric.local: starting tasktracker, logging to /nutch/hadoop/logs/ hadoop-nutch-tasktracker-Elric.local.out 10.0.1.8: starting tasktracker, logging to /nutch/hadoop/logs/hadoop- nutch-tasktracker-Corum.local.out Elric:/nutch/hadoop nutch$ ./bin/hadoop jar hadoop-*-examples.jar wordcount cat out4 06/07/31 11:15:45 INFO conf.Configuration: parsing file:/nutch/hadoop/ conf/hadoop-default.xml 06/07/31 11:15:45 INFO conf.Configuration: parsing file:/nutch/hadoop/ conf/mapred-default.xml 06/07/31 11:15:45 INFO conf.Configuration: parsing file:/nutch/hadoop/ conf/hadoop-site.xml 06/07/31 11:15:45 INFO ipc.Client: Client connection to 10.0.1.4:9000: starting 06/07/31 11:15:45 INFO ipc.Client: Client connection to 10.0.1.4:9001: starting 06/07/31 11:15:45 INFO conf.Configuration: parsing file:/nutch/hadoop/ conf/hadoop-default.xml 06/07/31 11:15:45 INFO conf.Configuration: parsing file:/nutch/hadoop/ conf/hadoop-site.xml 06/07/31 11:18:47 INFO fs.DFSClient: Waiting to find target node: Corum.local/10.0.1.8:50010 Here is the netstat on the server while waiting: Elric:/nutch/hadoop/logs ty$ netstat Active Internet connections Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp4 0 0 10.0.1.4.49808 corum.local.50010 SYN_SENT tcp4 0 0 10.0.1.4.49807 corum.local.50010 SYN_SENT tcp4 0 0 10.0.1.4.49806 corum.local.50010 SYN_SENT tcp4 0 0 10.0.1.4.etlservicemgr 10.0.1.4.49795 ESTABLISHED tcp4 0 0 10.0.1.4.49795 10.0.1.4.etlservicemgr ESTABLISHED tcp4 0 0 10.0.1.4.cslistener 10.0.1.4.49794 ESTABLISHED tcp4 0 0 10.0.1.4.49794 10.0.1.4.cslistener ESTABLISHED tcp4 0 0 10.0.1.4.cslistener corum.local.49265 ESTABLISHED tcp4 0 0 10.0.1.4.etlservicemgr corum.local.49264 ESTABLISHED tcp4 0 0 10.0.1.4.cslistener 10.0.1.4.49791 ESTABLISHED tcp4 0 0 10.0.1.4.49791 10.0.1.4.cslistener ESTABLISHED tcp4 0 0 10.0.1.4.etlservicemgr 10.0.1.4.49790 ESTABLISHED tcp4 0 0 10.0.1.4.49790 10.0.1.4.etlservicemgr ESTABLISHED tcp4 0 0 10.0.1.4.cslistener 10.0.1.4.49784 ESTABLISHED tcp4 0 0 10.0.1.4.49784 10.0.1.4.cslistener ESTABLISHED tcp4 0 0 10.0.1.4.cslistener corum.local.49260 ESTABLISHED tcp4 0 0 10.0.1.4.cslistener 10.0.1.4.49780 ESTABLISHED tcp4 0 0 10.0.1.4.49780 10.0.1.4.cslistener ESTABLISHED tcp4 0 0 10.0.1.4.49756 mail.mac.com.imap ESTABLISHED tcp4 0 0 10.0.1.4.49732 mail.mac.com.imap ESTABLISHED tcp4 0 0 localhost.netinfo-loca localhost.1015 ESTABLISHED tcp4 0 0 localhost.1015 localhost.netinfo-loca ESTABLISHED tcp4 0 0 localhost.ipulse-ics localhost.49174 ESTABLISHED tcp4 0 0 localhost.49174 localhost.ipulse-ics ESTABLISHED tcp4 0 0 localhost.ipulse-ics localhost.49173 ESTABLISHED tcp4 0 0 localhost.49173 localhost.ipulse-ics ESTABLISHED tcp4 0 0 localhost.ipulse-ics localhost.49172 ESTABLISHED tcp4 0 0 localhost.49172 localhost.ipulse-ics ESTABLISHED tcp4 0 0 localhost.netinfo-loca localhost.1017 ESTABLISHED tcp4 0 0 localhost.1017 localhost.netinfo-loca ESTABLISHED tcp4 0 0 localhost.netinfo-loca localhost.1021 ESTABLISHED tcp4 0 0 localhost.1021 localhost.netinfo-loca ESTABLISHED udp4 0 0 localhost.49292 localhost.1023 The exception I get in the log is: 2006-07-31 11:17:46,390 INFO org.apache.hadoop.dfs.DataNode: Received block blk_316809370547197643 from /10.0.1.4 2006-07-31 11:18:31,185 WARN org.apache.hadoop.dfs.DataNode: Failed to transfer blk_-8788276503516502504 to Corum.local/10.0.1.8:500 10 java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress (PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:430) at java.net.Socket.connect(Socket.java:507) at org.apache.hadoop.dfs.DataNode$DataTransfer.run (DataNode.java:782) at java.lang.Thread.run(Thread.java:613) Can anyone help me identify the issue? Thanks! Ty --Apple-Mail-1--41313449--