hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jiang licht <licht_ji...@yahoo.com>
Subject where does jobtracker get the IP and port of namenode?
Date Tue, 09 Mar 2010 03:24:45 GMT
The question probably sounds silly. It's weird that I got the following issues.

Namenode and datanode can start w/o any problem and the hdfs reports healthy. 

But tasktracker on slaves cannot start. In tasktracker log, I found it keeps trying to talk
to namenode A@a. But actually, in core-site.xml, for namenode, the setting is B@a. But yes,
A and B are all IP address for the namenode box. Actually B is a IP alias for loopback on
namenode box. So, basically, datanode is expected to request to B@a but will be answered by
A@a and this is fine and the hdfs is created. Now, to start tasktracker, it seems that it
also needs to contact namenode. But somehow, rather using B@a, it uses A@a, which I don't
understand. Where does tasktracker get A? Is there a setting specifically for tasktracker
to figure out namenode IP address and port? If it reads from core-site.xml, it should use
B@a instead of A@a. I am confused. Any thoughts?

Here's what is set in core-site.xml

dfs.default.name=>hdfs://B:50001

Here's what is set in mapred-site.xml

mapred.job.tracker=>B:50002

And on slave boxes, B is seen different from A. And slave boxes can reach B but not A (this
is why tasktracker cannot start by contacting namenode at A@50001, see the following error
message)

Here is the list of tasktracker log:

...
2010-03-08 21:04:06,169 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /A:50001.
Already tried 44 time(s).
2010-03-08 21:04:26,170 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: java.net.SocketTimeoutException:
Call to /A:50001 failed on socket timeout exception: java.net.SocketTimeoutException: 20000
millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending
remote=/A:50001]
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:771)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy5.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:110)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:211)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:174)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1448)
    at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:67)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1476)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:197)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
    at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1034)
    at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1721)
    at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2834)
Caused by: java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel
to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/A:50001]
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:407)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
    at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
    at org.apache.hadoop.ipc.Client.call(Client.java:720)
    ... 16 more

2010-03-08 21:04:47,178 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /A:50001.
Already tried 0 time(s).

Thanks,
--

Michael


      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message