hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ty at SummaZoo ...@summazoo.com>
Subject DFS Question
Date Mon, 31 Jul 2006 15:25:45 GMT
Hello,

I'm evaluating Hadoop for a large GIS application.

When running the wordcount example, I experience an issue where my  
master node cannot open a socket to port 50010 of my remove slave node.

When I run the example with only my master in the slaves file, it  
works fine.  When I add a second machine, i get the error.

Here is my config:

Running .0.3.2 of Hadoop
OS X 10.4.7 Server for master (Elric.local - 10.0.1.4)
OS X 10.4.7 for remote slave (Corum.local - 10.0.1.8)

Using the standard hadoop-default.xml.

Here's my hadoop-site.xml (which is the same on both machines):

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
   <name>fs.default.name</name>
   <value>Elric.local:9000</value>
   <description>
     The name of the default file system. Either the literal string
     "local" or a host:port for NDFS.
   </description>
</property>

<property>
   <name>mapred.job.tracker</name>
   <value>Elric.local:9001</value>
   <description>
     The host and port that the MapReduce job tracker runs at. If
     "local", then jobs are run in-process as a single map and
     reduce task.
   </description>
</property>

<property>
   <name>mapred.map.tasks</name>
   <value>12</value>
   <description>
     define mapred.map tasks to be number of slave hosts
   </description>
</property>

<property>
   <name>mapred.reduce.tasks</name>
   <value>12</value>
   <description>
     define mapred.reduce tasks to be number of slave hosts
   </description>
</property>

<property>
   <name>dfs.name.dir</name>
   <value>/nutch/filesystem/name</value>
</property>

<property>
   <name>dfs.data.dir</name>
   <value>/nutch/filesystem/data</value>
</property>

<property>
   <name>mapred.system.dir</name>
   <value>/nutch/filesystem/mapreduce/system</value>
</property>

<property>
   <name>mapred.local.dir</name>
   <value>/nutch/filesystem/mapreduce/local</value>
</property>

<property>
   <name>dfs.replication</name>
   <value>2</value>
</property>

<property>
   <name>dfs.datanode.port</name>
   <value>50010</value>
   <description>The port number that the dfs datanode server uses as  
a starting
	       point to look for a free port to listen on.
</description>
</property>

<property>
   <name>dfs.namenode.logging.level</name>
   <value>debug</value>
   <description>The logging level for dfs namenode. Other values are  
"dir"(trac
e namespace mutations), "block"(trace block under/over replications  
and block
creations/deletions), or "all".</description>
</property>
</configuration>

Here is the terminal output on the server:

Elric:/nutch/hadoop nutch$ ./start-all.sh
-su: ./start-all.sh: No such file or directory
Elric:/nutch/hadoop nutch$ ./bin/start-all.sh
rsync from Elric.local:/nutch/hadoop
starting namenode, logging to /nutch/hadoop/logs/hadoop-nutch- 
namenode-Elric.local.out
Elric.local: rsync from Elric.local:/nutch/hadoop
10.0.1.8: rsync from Elric.local:/nutch/hadoop
Elric.local: starting datanode, logging to /nutch/hadoop/logs/hadoop- 
nutch-datanode-Elric.local.out
10.0.1.8: starting datanode, logging to /nutch/hadoop/logs/hadoop- 
nutch-datanode-Corum.local.out
rsync from Elric.local:/nutch/hadoop
starting jobtracker, logging to /nutch/hadoop/logs/hadoop-nutch- 
jobtracker-Elric.local.out
Elric.local: rsync from Elric.local:/nutch/hadoop
10.0.1.8: rsync from Elric.local:/nutch/hadoop
Elric.local: starting tasktracker, logging to /nutch/hadoop/logs/ 
hadoop-nutch-tasktracker-Elric.local.out
10.0.1.8: starting tasktracker, logging to /nutch/hadoop/logs/hadoop- 
nutch-tasktracker-Corum.local.out
Elric:/nutch/hadoop nutch$ ./bin/hadoop jar hadoop-*-examples.jar  
wordcount cat out4
06/07/31 11:15:45 INFO conf.Configuration: parsing file:/nutch/hadoop/ 
conf/hadoop-default.xml
06/07/31 11:15:45 INFO conf.Configuration: parsing file:/nutch/hadoop/ 
conf/mapred-default.xml
06/07/31 11:15:45 INFO conf.Configuration: parsing file:/nutch/hadoop/ 
conf/hadoop-site.xml
06/07/31 11:15:45 INFO ipc.Client: Client connection to  
10.0.1.4:9000: starting
06/07/31 11:15:45 INFO ipc.Client: Client connection to  
10.0.1.4:9001: starting
06/07/31 11:15:45 INFO conf.Configuration: parsing file:/nutch/hadoop/ 
conf/hadoop-default.xml
06/07/31 11:15:45 INFO conf.Configuration: parsing file:/nutch/hadoop/ 
conf/hadoop-site.xml
06/07/31 11:18:47 INFO fs.DFSClient: Waiting to find target node:  
Corum.local/10.0.1.8:50010


Here is the netstat on the server while waiting:

Elric:/nutch/hadoop/logs ty$ netstat
Active Internet connections
Proto Recv-Q Send-Q  Local Address          Foreign Address         
(state)
tcp4       0      0  10.0.1.4.49808         corum.local.50010       
SYN_SENT
tcp4       0      0  10.0.1.4.49807         corum.local.50010       
SYN_SENT
tcp4       0      0  10.0.1.4.49806         corum.local.50010       
SYN_SENT
tcp4       0      0  10.0.1.4.etlservicemgr 10.0.1.4.49795          
ESTABLISHED
tcp4       0      0  10.0.1.4.49795         10.0.1.4.etlservicemgr  
ESTABLISHED
tcp4       0      0  10.0.1.4.cslistener    10.0.1.4.49794          
ESTABLISHED
tcp4       0      0  10.0.1.4.49794         10.0.1.4.cslistener     
ESTABLISHED
tcp4       0      0  10.0.1.4.cslistener    corum.local.49265       
ESTABLISHED
tcp4       0      0  10.0.1.4.etlservicemgr corum.local.49264       
ESTABLISHED
tcp4       0      0  10.0.1.4.cslistener    10.0.1.4.49791          
ESTABLISHED
tcp4       0      0  10.0.1.4.49791         10.0.1.4.cslistener     
ESTABLISHED
tcp4       0      0  10.0.1.4.etlservicemgr 10.0.1.4.49790          
ESTABLISHED
tcp4       0      0  10.0.1.4.49790         10.0.1.4.etlservicemgr  
ESTABLISHED
tcp4       0      0  10.0.1.4.cslistener    10.0.1.4.49784          
ESTABLISHED
tcp4       0      0  10.0.1.4.49784         10.0.1.4.cslistener     
ESTABLISHED
tcp4       0      0  10.0.1.4.cslistener    corum.local.49260       
ESTABLISHED
tcp4       0      0  10.0.1.4.cslistener    10.0.1.4.49780          
ESTABLISHED
tcp4       0      0  10.0.1.4.49780         10.0.1.4.cslistener     
ESTABLISHED
tcp4       0      0  10.0.1.4.49756         mail.mac.com.imap       
ESTABLISHED
tcp4       0      0  10.0.1.4.49732         mail.mac.com.imap       
ESTABLISHED
tcp4       0      0  localhost.netinfo-loca localhost.1015          
ESTABLISHED
tcp4       0      0  localhost.1015         localhost.netinfo-loca  
ESTABLISHED
tcp4       0      0  localhost.ipulse-ics   localhost.49174         
ESTABLISHED
tcp4       0      0  localhost.49174        localhost.ipulse-ics    
ESTABLISHED
tcp4       0      0  localhost.ipulse-ics   localhost.49173         
ESTABLISHED
tcp4       0      0  localhost.49173        localhost.ipulse-ics    
ESTABLISHED
tcp4       0      0  localhost.ipulse-ics   localhost.49172         
ESTABLISHED
tcp4       0      0  localhost.49172        localhost.ipulse-ics    
ESTABLISHED
tcp4       0      0  localhost.netinfo-loca localhost.1017          
ESTABLISHED
tcp4       0      0  localhost.1017         localhost.netinfo-loca  
ESTABLISHED
tcp4       0      0  localhost.netinfo-loca localhost.1021          
ESTABLISHED
tcp4       0      0  localhost.1021         localhost.netinfo-loca  
ESTABLISHED
udp4       0      0  localhost.49292        localhost.1023

The exception I get in the log is:

2006-07-31 11:17:46,390 INFO org.apache.hadoop.dfs.DataNode: Received  
block blk_316809370547197643 from /10.0.1.4
2006-07-31 11:18:31,185 WARN org.apache.hadoop.dfs.DataNode: Failed  
to transfer blk_-8788276503516502504 to Corum.local/10.0.1.8:500
10
java.net.SocketTimeoutException: connect timed out
         at java.net.PlainSocketImpl.socketConnect(Native Method)
         at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
         at java.net.PlainSocketImpl.connectToAddress 
(PlainSocketImpl.java:195)
         at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:430)
         at java.net.Socket.connect(Socket.java:507)
         at org.apache.hadoop.dfs.DataNode$DataTransfer.run 
(DataNode.java:782)
         at java.lang.Thread.run(Thread.java:613)


Can anyone help me identify the issue?

Thanks!

Ty
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message