hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Björn-Elmar Macek <ma...@cs.uni-kassel.de>
Subject DataNode and Tasttracker communication
Date Mon, 13 Aug 2012 12:31:06 GMT
Hi,

i am currently trying to run my hadoop program on a cluster. Sadly 
though my datanodes and tasktrackers seem to have difficulties with 
their communication as their logs say:
* Some datanodes and tasktrackers seem to have portproblems of some kind 
as it can be seen in the logs below. I wondered if this might be due to 
reasons correllated with the localhost entry in /etc/hosts as you can 
read in alot of posts with similar errors, but i checked the file 
neither localhost nor 127.0.0.1/127.0.1.1 is bound there. (although you 
can ping localhost... the technician of the cluster said he'd be looking 
for the mechanics resolving localhost)
* The other nodes can not speak with the namenode and jobtracker 
(its-cs131). Although it is absolutely not clear, why this is happening: 
the "dfs -put" i do directly before the job is running fine, which seems 
to imply that communication between those servers is working flawlessly.

Is there any reason why this might happen?


Regards,
Elmar

LOGS BELOW:

\____Datanodes

After successfully putting the data to hdfs (at this point i thought 
namenode and datanodes have to communicate), i get the following errors 
when starting the job:

There are 2 kinds of logs i found: the first one is big (about 12MB) and 
looks like this:
############################### LOG TYPE 1 
############################################################
2012-08-13 08:23:27,331 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: its-cs131/141.51.205.41:35554. Already tried 0 time(s).
2012-08-13 08:23:28,332 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: its-cs131/141.51.205.41:35554. Already tried 1 time(s).
2012-08-13 08:23:29,332 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: its-cs131/141.51.205.41:35554. Already tried 2 time(s).
2012-08-13 08:23:30,332 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: its-cs131/141.51.205.41:35554. Already tried 3 time(s).
2012-08-13 08:23:31,333 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: its-cs131/141.51.205.41:35554. Already tried 4 time(s).
2012-08-13 08:23:32,333 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: its-cs131/141.51.205.41:35554. Already tried 5 time(s).
2012-08-13 08:23:33,334 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: its-cs131/141.51.205.41:35554. Already tried 6 time(s).
2012-08-13 08:23:34,334 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: its-cs131/141.51.205.41:35554. Already tried 7 time(s).
2012-08-13 08:23:35,334 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: its-cs131/141.51.205.41:35554. Already tried 8 time(s).
2012-08-13 08:23:36,335 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: its-cs131/141.51.205.41:35554. Already tried 9 time(s).
2012-08-13 08:23:36,335 WARN 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
java.net.ConnectException: Call to its-cs131/141.51.205.41:35554 failed 
on connection exception: java.net.ConnectException: Connection refused
     at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
     at org.apache.hadoop.ipc.Client.call(Client.java:1071)
     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
     at $Proxy5.sendHeartbeat(Unknown Source)
     at 
org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:904)
     at 
org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1458)
     at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.ConnectException: Connection refused
     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
     at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
     at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
     at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
     at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
     at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
     at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
     at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
     at org.apache.hadoop.ipc.Client.call(Client.java:1046)
     ... 5 more

... (this continues til the end of the log)

The second is short kind:
########################### LOG TYPE 2 
############################################################
2012-08-13 00:59:19,038 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = its-cs133.its.uni-kassel.de/141.51.205.43
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 1.0.2
STARTUP_MSG:   build = 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2 -r 
1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012
************************************************************/
2012-08-13 00:59:19,203 INFO 
org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from 
hadoop-metrics2.properties
2012-08-13 00:59:19,216 INFO 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source 
MetricsSystem,sub=Stats registered.
2012-08-13 00:59:19,217 INFO 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot 
period at 10 second(s).
2012-08-13 00:59:19,218 INFO 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics 
system started
2012-08-13 00:59:19,306 INFO 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source 
ugi registered.
2012-08-13 00:59:19,346 INFO org.apache.hadoop.util.NativeCodeLoader: 
Loaded the native-hadoop library
2012-08-13 00:59:20,482 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: its-cs131/141.51.205.41:35554. Already tried 0 time(s).
2012-08-13 00:59:21,584 INFO 
org.apache.hadoop.hdfs.server.common.Storage: Storage directory 
/home/work/bmacek/hadoop/hdfs/slave is not formatted.
2012-08-13 00:59:21,584 INFO 
org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
2012-08-13 00:59:21,787 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: Registered 
FSDatasetStatusMBean
2012-08-13 00:59:21,897 INFO 
org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: 
Shutting down all async disk service threads...
2012-08-13 00:59:21,897 INFO 
org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: All 
async disk service threads have been shut down.
2012-08-13 00:59:21,898 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: java.net.BindException: 
Problem binding to /0.0.0.0:50010 : Address already in use
     at org.apache.hadoop.ipc.Server.bind(Server.java:227)
     at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:404)
     at 
org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299)
     at 
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582)
     at 
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
     at 
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)
     at 
org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665)
     at 
org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682)
Caused by: java.net.BindException: Address already in use
     at sun.nio.ch.Net.bind(Native Method)
     at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
     at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
     at org.apache.hadoop.ipc.Server.bind(Server.java:225)
     ... 7 more

2012-08-13 00:59:21,899 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at 
its-cs133.its.uni-kassel.de/141.51.205.43
************************************************************/





\_____TastTracker
With TaskTrackers it is the same: there are 2 kinds.
############################### LOG TYPE 1 
############################################################
2012-08-13 02:09:54,645 INFO org.apache.hadoop.mapred.TaskTracker: 
Resending 'status' to 'its-cs131' with reponseId '879
2012-08-13 02:09:55,646 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: its-cs131/141.51.205.41:35555. Already tried 0 time(s).
2012-08-13 02:09:56,646 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: its-cs131/141.51.205.41:35555. Already tried 1 time(s).
2012-08-13 02:09:57,647 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: its-cs131/141.51.205.41:35555. Already tried 2 time(s).
2012-08-13 02:09:58,647 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: its-cs131/141.51.205.41:35555. Already tried 3 time(s).
2012-08-13 02:09:59,648 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: its-cs131/141.51.205.41:35555. Already tried 4 time(s).
2012-08-13 02:10:00,648 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: its-cs131/141.51.205.41:35555. Already tried 5 time(s).
2012-08-13 02:10:01,649 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: its-cs131/141.51.205.41:35555. Already tried 6 time(s).
2012-08-13 02:10:02,649 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: its-cs131/141.51.205.41:35555. Already tried 7 time(s).
2012-08-13 02:10:03,650 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: its-cs131/141.51.205.41:35555. Already tried 8 time(s).
2012-08-13 02:10:04,650 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: its-cs131/141.51.205.41:35555. Already tried 9 time(s).
2012-08-13 02:10:04,651 ERROR org.apache.hadoop.mapred.TaskTracker: 
Caught exception: java.net.ConnectException: Call to 
its-cs131/141.51.205.41:35555 failed on connection exception: 
java.net.ConnectException: Connection refused
     at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
     at org.apache.hadoop.ipc.Client.call(Client.java:1071)
     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
     at org.apache.hadoop.mapred.$Proxy5.heartbeat(Unknown Source)
     at 
org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1857)
     at 
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1653)
     at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2503)
     at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3744)
Caused by: java.net.ConnectException: Connection refused
     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
     at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
     at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
     at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
     at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
     at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
     at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
     at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
     at org.apache.hadoop.ipc.Client.call(Client.java:1046)
     ... 6 more


########################### LOG TYPE 2 
############################################################
2012-08-13 00:59:24,376 INFO org.apache.hadoop.mapred.TaskTracker: 
STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting TaskTracker
STARTUP_MSG:   host = its-cs133.its.uni-kassel.de/141.51.205.43
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 1.0.2
STARTUP_MSG:   build = 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2 -r 
1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012
************************************************************/
2012-08-13 00:59:24,569 INFO 
org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from 
hadoop-metrics2.properties
2012-08-13 00:59:24,626 INFO 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source 
MetricsSystem,sub=Stats registered.
2012-08-13 00:59:24,627 INFO 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot 
period at 10 second(s).
2012-08-13 00:59:24,627 INFO 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: TaskTracker metrics 
system started
2012-08-13 00:59:24,950 INFO 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source 
ugi registered.
2012-08-13 00:59:25,146 INFO org.mortbay.log: Logging to 
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via 
org.mortbay.log.Slf4jLog
2012-08-13 00:59:25,206 INFO org.apache.hadoop.http.HttpServer: Added 
global filtersafety 
(class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2012-08-13 00:59:25,232 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2012-08-13 00:59:25,237 INFO org.apache.hadoop.mapred.TaskTracker: 
Starting tasktracker with owner as bmacek
2012-08-13 00:59:25,239 INFO org.apache.hadoop.mapred.TaskTracker: Good 
mapred local directories are: /home/work/bmacek/hadoop/hdfs/tmp/mapred/local
2012-08-13 00:59:25,244 INFO org.apache.hadoop.util.NativeCodeLoader: 
Loaded the native-hadoop library
2012-08-13 00:59:25,255 INFO 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source 
jvm registered.
2012-08-13 00:59:25,256 INFO 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source 
TaskTrackerMetrics registered.
2012-08-13 00:59:25,279 INFO org.apache.hadoop.ipc.Server: Starting 
SocketReader
2012-08-13 00:59:25,282 INFO 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source 
RpcDetailedActivityForPort54850 registered.
2012-08-13 00:59:25,282 INFO 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source 
RpcActivityForPort54850 registered.
2012-08-13 00:59:25,287 INFO org.apache.hadoop.ipc.Server: IPC Server 
Responder: starting
2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC Server 
listener on 54850: starting
2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 0 on 54850: starting
2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 1 on 54850: starting
2012-08-13 00:59:25,289 INFO org.apache.hadoop.mapred.TaskTracker: 
TaskTracker up at: localhost/127.0.0.1:54850
2012-08-13 00:59:25,289 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 3 on 54850: starting
2012-08-13 00:59:25,289 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 2 on 54850: starting
2012-08-13 00:59:25,289 INFO org.apache.hadoop.mapred.TaskTracker: 
Starting tracker 
tracker_its-cs133.its.uni-kassel.de:localhost/127.0.0.1:54850
2012-08-13 00:59:26,321 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: its-cs131/141.51.205.41:35555. Already tried 0 time(s).
2012-08-13 00:59:38,104 INFO org.apache.hadoop.mapred.TaskTracker: 
Starting thread: Map-events fetcher for all reduce tasks on 
tracker_its-cs133.its.uni-kassel.de:localhost/127.0.0.1:54850
2012-08-13 00:59:38,120 INFO org.apache.hadoop.util.ProcessTree: setsid 
exited with exit code 0
2012-08-13 00:59:38,134 INFO org.apache.hadoop.mapred.TaskTracker: Using 
ResourceCalculatorPlugin : 
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@445e228
2012-08-13 00:59:38,137 WARN org.apache.hadoop.mapred.TaskTracker: 
TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is 
disabled.
2012-08-13 00:59:38,145 INFO org.apache.hadoop.mapred.IndexCache: 
IndexCache created with max memory = 10485760
2012-08-13 00:59:38,158 INFO 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source 
ShuffleServerMetrics registered.
2012-08-13 00:59:38,161 INFO org.apache.hadoop.http.HttpServer: Port 
returned by webServer.getConnectors()[0].getLocalPort() before open() is 
-1. Opening the listener on 50060
2012-08-13 00:59:38,161 ERROR org.apache.hadoop.mapred.TaskTracker: Can 
not start task tracker because java.net.BindException: Address already 
in use
     at sun.nio.ch.Net.bind(Native Method)
     at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
     at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
     at 
org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
     at org.apache.hadoop.http.HttpServer.start(HttpServer.java:581)
     at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1502)
     at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3742)

2012-08-13 00:59:38,163 INFO org.apache.hadoop.mapred.TaskTracker: 
SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down TaskTracker at 
its-cs133.its.uni-kassel.de/141.51.205.43
************************************************************/

Mime
View raw message