hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: DataNode and Tasttracker communication
Date Mon, 13 Aug 2012 12:59:21 GMT
If the nodes can communicate and distribute data, then the odds are that the issue isn't going
to be in his /etc/hosts. 

A more relevant question is if he's running a firewall on each of these machines? 

A simple test... ssh to one node, ping other nodes and the control nodes at random to see
if they can see one another. Then check to see if there is a firewall running which would
limit the types of traffic between nodes. 

One other side note... are these machines multi-homed?

On Aug 13, 2012, at 7:51 AM, Mohammad Tariq <dontariq@gmail.com> wrote:

> Hello there,
> 
>      Could you please share your /etc/hosts file, if you don't mind.
> 
> Regards,
>     Mohammad Tariq
> 
> 
> 
> On Mon, Aug 13, 2012 at 6:01 PM, Björn-Elmar Macek <macek@cs.uni-kassel.de> wrote:
> Hi,
> 
> i am currently trying to run my hadoop program on a cluster. Sadly though my datanodes
and tasktrackers seem to have difficulties with their communication as their logs say:
> * Some datanodes and tasktrackers seem to have portproblems of some kind as it can be
seen in the logs below. I wondered if this might be due to reasons correllated with the localhost
entry in /etc/hosts as you can read in alot of posts with similar errors, but i checked the
file neither localhost nor 127.0.0.1/127.0.1.1 is bound there. (although you can ping localhost...
the technician of the cluster said he'd be looking for the mechanics resolving localhost)
> * The other nodes can not speak with the namenode and jobtracker (its-cs131). Although
it is absolutely not clear, why this is happening: the "dfs -put" i do directly before the
job is running fine, which seems to imply that communication between those servers is working
flawlessly.
> 
> Is there any reason why this might happen?
> 
> 
> Regards,
> Elmar
> 
> LOGS BELOW:
> 
> \____Datanodes
> 
> After successfully putting the data to hdfs (at this point i thought namenode and datanodes
have to communicate), i get the following errors when starting the job:
> 
> There are 2 kinds of logs i found: the first one is big (about 12MB) and looks like this:
> ############################### LOG TYPE 1 ############################################################
> 2012-08-13 08:23:27,331 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
its-cs131/141.51.205.41:35554. Already tried 0 time(s).
> 2012-08-13 08:23:28,332 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
its-cs131/141.51.205.41:35554. Already tried 1 time(s).
> 2012-08-13 08:23:29,332 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
its-cs131/141.51.205.41:35554. Already tried 2 time(s).
> 2012-08-13 08:23:30,332 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
its-cs131/141.51.205.41:35554. Already tried 3 time(s).
> 2012-08-13 08:23:31,333 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
its-cs131/141.51.205.41:35554. Already tried 4 time(s).
> 2012-08-13 08:23:32,333 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
its-cs131/141.51.205.41:35554. Already tried 5 time(s).
> 2012-08-13 08:23:33,334 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
its-cs131/141.51.205.41:35554. Already tried 6 time(s).
> 2012-08-13 08:23:34,334 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
its-cs131/141.51.205.41:35554. Already tried 7 time(s).
> 2012-08-13 08:23:35,334 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
its-cs131/141.51.205.41:35554. Already tried 8 time(s).
> 2012-08-13 08:23:36,335 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
its-cs131/141.51.205.41:35554. Already tried 9 time(s).
> 2012-08-13 08:23:36,335 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: java.net.ConnectException:
Call to its-cs131/141.51.205.41:35554 failed on connection exception: java.net.ConnectException:
Connection refused
>     at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1071)
>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>     at $Proxy5.sendHeartbeat(Unknown Source)
>     at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:904)
>     at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1458)
>     at java.lang.Thread.run(Thread.java:619)
> Caused by: java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>     at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>     at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
>     at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
>     at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
>     at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1046)
>     ... 5 more
> 
> ... (this continues til the end of the log)
> 
> The second is short kind:
> ########################### LOG TYPE 2 ############################################################
> 2012-08-13 00:59:19,038 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting DataNode
> STARTUP_MSG:   host = its-cs133.its.uni-kassel.de/141.51.205.43
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 1.0.2
> STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2
-r 1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012
> ************************************************************/
> 2012-08-13 00:59:19,203 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties
from hadoop-metrics2.properties
> 2012-08-13 00:59:19,216 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean
for source MetricsSystem,sub=Stats registered.
> 2012-08-13 00:59:19,217 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled
snapshot period at 10 second(s).
> 2012-08-13 00:59:19,218 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode
metrics system started
> 2012-08-13 00:59:19,306 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean
for source ugi registered.
> 2012-08-13 00:59:19,346 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop
library
> 2012-08-13 00:59:20,482 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
its-cs131/141.51.205.41:35554. Already tried 0 time(s).
> 2012-08-13 00:59:21,584 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory
/home/work/bmacek/hadoop/hdfs/slave is not formatted.
> 2012-08-13 00:59:21,584 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting
...
> 2012-08-13 00:59:21,787 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Registered
FSDatasetStatusMBean
> 2012-08-13 00:59:21,897 INFO org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService:
Shutting down all async disk service threads...
> 2012-08-13 00:59:21,897 INFO org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService:
All async disk service threads have been shut down.
> 2012-08-13 00:59:21,898 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.net.BindException:
Problem binding to /0.0.0.0:50010 : Address already in use
>     at org.apache.hadoop.ipc.Server.bind(Server.java:227)
>     at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:404)
>     at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299)
>     at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582)
>     at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
>     at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)
>     at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665)
>     at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682)
> Caused by: java.net.BindException: Address already in use
>     at sun.nio.ch.Net.bind(Native Method)
>     at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
>     at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
>     at org.apache.hadoop.ipc.Server.bind(Server.java:225)
>     ... 7 more
> 
> 2012-08-13 00:59:21,899 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down DataNode at its-cs133.its.uni-kassel.de/141.51.205.43
> ************************************************************/
> 
> 
> 
> 
> 
> \_____TastTracker
> With TaskTrackers it is the same: there are 2 kinds.
> ############################### LOG TYPE 1 ############################################################
> 2012-08-13 02:09:54,645 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status'
to 'its-cs131' with reponseId '879
> 2012-08-13 02:09:55,646 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
its-cs131/141.51.205.41:35555. Already tried 0 time(s).
> 2012-08-13 02:09:56,646 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
its-cs131/141.51.205.41:35555. Already tried 1 time(s).
> 2012-08-13 02:09:57,647 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
its-cs131/141.51.205.41:35555. Already tried 2 time(s).
> 2012-08-13 02:09:58,647 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
its-cs131/141.51.205.41:35555. Already tried 3 time(s).
> 2012-08-13 02:09:59,648 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
its-cs131/141.51.205.41:35555. Already tried 4 time(s).
> 2012-08-13 02:10:00,648 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
its-cs131/141.51.205.41:35555. Already tried 5 time(s).
> 2012-08-13 02:10:01,649 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
its-cs131/141.51.205.41:35555. Already tried 6 time(s).
> 2012-08-13 02:10:02,649 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
its-cs131/141.51.205.41:35555. Already tried 7 time(s).
> 2012-08-13 02:10:03,650 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
its-cs131/141.51.205.41:35555. Already tried 8 time(s).
> 2012-08-13 02:10:04,650 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
its-cs131/141.51.205.41:35555. Already tried 9 time(s).
> 2012-08-13 02:10:04,651 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception:
java.net.ConnectException: Call to its-cs131/141.51.205.41:35555 failed on connection exception:
java.net.ConnectException: Connection refused
>     at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1071)
>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>     at org.apache.hadoop.mapred.$Proxy5.heartbeat(Unknown Source)
>     at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1857)
>     at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1653)
>     at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2503)
>     at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3744)
> Caused by: java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>     at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>     at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
>     at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
>     at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
>     at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1046)
>     ... 6 more
> 
> 
> ########################### LOG TYPE 2 ############################################################
> 2012-08-13 00:59:24,376 INFO org.apache.hadoop.mapred.TaskTracker: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting TaskTracker
> STARTUP_MSG:   host = its-cs133.its.uni-kassel.de/141.51.205.43
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 1.0.2
> STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2
-r 1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012
> ************************************************************/
> 2012-08-13 00:59:24,569 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties
from hadoop-metrics2.properties
> 2012-08-13 00:59:24,626 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean
for source MetricsSystem,sub=Stats registered.
> 2012-08-13 00:59:24,627 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled
snapshot period at 10 second(s).
> 2012-08-13 00:59:24,627 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: TaskTracker
metrics system started
> 2012-08-13 00:59:24,950 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean
for source ugi registered.
> 2012-08-13 00:59:25,146 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log)
via org.mortbay.log.Slf4jLog
> 2012-08-13 00:59:25,206 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety
(class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> 2012-08-13 00:59:25,232 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing
logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2012-08-13 00:59:25,237 INFO org.apache.hadoop.mapred.TaskTracker: Starting tasktracker
with owner as bmacek
> 2012-08-13 00:59:25,239 INFO org.apache.hadoop.mapred.TaskTracker: Good mapred local
directories are: /home/work/bmacek/hadoop/hdfs/tmp/mapred/local
> 2012-08-13 00:59:25,244 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop
library
> 2012-08-13 00:59:25,255 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean
for source jvm registered.
> 2012-08-13 00:59:25,256 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean
for source TaskTrackerMetrics registered.
> 2012-08-13 00:59:25,279 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
> 2012-08-13 00:59:25,282 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean
for source RpcDetailedActivityForPort54850 registered.
> 2012-08-13 00:59:25,282 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean
for source RpcActivityForPort54850 registered.
> 2012-08-13 00:59:25,287 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
> 2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 54850:
starting
> 2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 54850:
starting
> 2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 54850:
starting
> 2012-08-13 00:59:25,289 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker up at:
localhost/127.0.0.1:54850
> 2012-08-13 00:59:25,289 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 54850:
starting
> 2012-08-13 00:59:25,289 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 54850:
starting
> 2012-08-13 00:59:25,289 INFO org.apache.hadoop.mapred.TaskTracker: Starting tracker tracker_its-cs133.its.uni-kassel.de:localhost/127.0.0.1:54850
> 2012-08-13 00:59:26,321 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
its-cs131/141.51.205.41:35555. Already tried 0 time(s).
> 2012-08-13 00:59:38,104 INFO org.apache.hadoop.mapred.TaskTracker: Starting thread: Map-events
fetcher for all reduce tasks on tracker_its-cs133.its.uni-kassel.de:localhost/127.0.0.1:54850
> 2012-08-13 00:59:38,120 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit
code 0
> 2012-08-13 00:59:38,134 INFO org.apache.hadoop.mapred.TaskTracker: Using ResourceCalculatorPlugin
: org.apache.hadoop.util.LinuxResourceCalculatorPlugin@445e228
> 2012-08-13 00:59:38,137 WARN org.apache.hadoop.mapred.TaskTracker: TaskTracker's totalMemoryAllottedForTasks
is -1. TaskMemoryManager is disabled.
> 2012-08-13 00:59:38,145 INFO org.apache.hadoop.mapred.IndexCache: IndexCache created
with max memory = 10485760
> 2012-08-13 00:59:38,158 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean
for source ShuffleServerMetrics registered.
> 2012-08-13 00:59:38,161 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort()
before open() is -1. Opening the listener on 50060
> 2012-08-13 00:59:38,161 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task
tracker because java.net.BindException: Address already in use
>     at sun.nio.ch.Net.bind(Native Method)
>     at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
>     at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
>     at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
>     at org.apache.hadoop.http.HttpServer.start(HttpServer.java:581)
>     at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1502)
>     at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3742)
> 
> 2012-08-13 00:59:38,163 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down TaskTracker at its-cs133.its.uni-kassel.de/141.51.205.43
> ************************************************************/
> 


Mime
View raw message