hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammad Tariq <donta...@gmail.com>
Subject Re: DataNode and Tasttracker communication
Date Mon, 13 Aug 2012 12:51:38 GMT
Hello there,

     Could you please share your /etc/hosts file, if you don't mind.

Regards,
    Mohammad Tariq



On Mon, Aug 13, 2012 at 6:01 PM, Björn-Elmar Macek
<macek@cs.uni-kassel.de>wrote:

> Hi,
>
> i am currently trying to run my hadoop program on a cluster. Sadly though
> my datanodes and tasktrackers seem to have difficulties with their
> communication as their logs say:
> * Some datanodes and tasktrackers seem to have portproblems of some kind
> as it can be seen in the logs below. I wondered if this might be due to
> reasons correllated with the localhost entry in /etc/hosts as you can read
> in alot of posts with similar errors, but i checked the file neither
> localhost nor 127.0.0.1/127.0.1.1 is bound there. (although you can ping
> localhost... the technician of the cluster said he'd be looking for the
> mechanics resolving localhost)
> * The other nodes can not speak with the namenode and jobtracker
> (its-cs131). Although it is absolutely not clear, why this is happening:
> the "dfs -put" i do directly before the job is running fine, which seems to
> imply that communication between those servers is working flawlessly.
>
> Is there any reason why this might happen?
>
>
> Regards,
> Elmar
>
> LOGS BELOW:
>
> \____Datanodes
>
> After successfully putting the data to hdfs (at this point i thought
> namenode and datanodes have to communicate), i get the following errors
> when starting the job:
>
> There are 2 kinds of logs i found: the first one is big (about 12MB) and
> looks like this:
> ##############################**# LOG TYPE 1
> ##############################**##############################
> 2012-08-13 08:23:27,331 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: its-cs131/141.51.205.41:35554. Already tried 0 time(s).
> 2012-08-13 08:23:28,332 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: its-cs131/141.51.205.41:35554. Already tried 1 time(s).
> 2012-08-13 08:23:29,332 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: its-cs131/141.51.205.41:35554. Already tried 2 time(s).
> 2012-08-13 08:23:30,332 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: its-cs131/141.51.205.41:35554. Already tried 3 time(s).
> 2012-08-13 08:23:31,333 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: its-cs131/141.51.205.41:35554. Already tried 4 time(s).
> 2012-08-13 08:23:32,333 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: its-cs131/141.51.205.41:35554. Already tried 5 time(s).
> 2012-08-13 08:23:33,334 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: its-cs131/141.51.205.41:35554. Already tried 6 time(s).
> 2012-08-13 08:23:34,334 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: its-cs131/141.51.205.41:35554. Already tried 7 time(s).
> 2012-08-13 08:23:35,334 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: its-cs131/141.51.205.41:35554. Already tried 8 time(s).
> 2012-08-13 08:23:36,335 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: its-cs131/141.51.205.41:35554. Already tried 9 time(s).
> 2012-08-13 08:23:36,335 WARN org.apache.hadoop.hdfs.server.**datanode.DataNode:
> java.net.ConnectException: Call to its-cs131/141.51.205.41:35554 failed
> on connection exception: java.net.ConnectException: Connection refused
>     at org.apache.hadoop.ipc.Client.**wrapException(Client.java:**1095)
>     at org.apache.hadoop.ipc.Client.**call(Client.java:1071)
>     at org.apache.hadoop.ipc.RPC$**Invoker.invoke(RPC.java:225)
>     at $Proxy5.sendHeartbeat(Unknown Source)
>     at org.apache.hadoop.hdfs.server.**datanode.DataNode.**
> offerService(DataNode.java:**904)
>     at org.apache.hadoop.hdfs.server.**datanode.DataNode.run(**
> DataNode.java:1458)
>     at java.lang.Thread.run(Thread.**java:619)
> Caused by: java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.**checkConnect(Native Method)
>     at sun.nio.ch.SocketChannelImpl.**finishConnect(**
> SocketChannelImpl.java:574)
>     at org.apache.hadoop.net.**SocketIOWithTimeout.connect(**
> SocketIOWithTimeout.java:206)
>     at org.apache.hadoop.net.**NetUtils.connect(NetUtils.**java:489)
>     at org.apache.hadoop.ipc.Client$**Connection.setupConnection(**
> Client.java:434)
>     at org.apache.hadoop.ipc.Client$**Connection.setupIOstreams(**
> Client.java:560)
>     at org.apache.hadoop.ipc.Client$**Connection.access$2000(Client.**
> java:184)
>     at org.apache.hadoop.ipc.Client.**getConnection(Client.java:**1202)
>     at org.apache.hadoop.ipc.Client.**call(Client.java:1046)
>     ... 5 more
>
> ... (this continues til the end of the log)
>
> The second is short kind:
> ########################### LOG TYPE 2 ##############################**
> ##############################
> 2012-08-13 00:59:19,038 INFO org.apache.hadoop.hdfs.server.**datanode.DataNode:
> STARTUP_MSG:
> /****************************************************************
> STARTUP_MSG: Starting DataNode
> STARTUP_MSG:   host = its-cs133.its.uni-kassel.de/**141.51.205.43<http://its-cs133.its.uni-kassel.de/141.51.205.43>
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 1.0.2
> STARTUP_MSG:   build = https://svn.apache.org/repos/**
> asf/hadoop/common/branches/**branch-1.0.2<https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2>-r
1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012
> ****************************************************************/
> 2012-08-13 00:59:19,203 INFO org.apache.hadoop.metrics2.**impl.MetricsConfig:
> loaded properties from hadoop-metrics2.properties
> 2012-08-13 00:59:19,216 INFO org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter:
> MBean for source MetricsSystem,sub=Stats registered.
> 2012-08-13 00:59:19,217 INFO org.apache.hadoop.metrics2.**impl.MetricsSystemImpl:
> Scheduled snapshot period at 10 second(s).
> 2012-08-13 00:59:19,218 INFO org.apache.hadoop.metrics2.**impl.MetricsSystemImpl:
> DataNode metrics system started
> 2012-08-13 00:59:19,306 INFO org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter:
> MBean for source ugi registered.
> 2012-08-13 00:59:19,346 INFO org.apache.hadoop.util.**NativeCodeLoader:
> Loaded the native-hadoop library
> 2012-08-13 00:59:20,482 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: its-cs131/141.51.205.41:35554. Already tried 0 time(s).
> 2012-08-13 00:59:21,584 INFO org.apache.hadoop.hdfs.server.**common.Storage:
> Storage directory /home/work/bmacek/hadoop/hdfs/**slave is not formatted.
> 2012-08-13 00:59:21,584 INFO org.apache.hadoop.hdfs.server.**common.Storage:
> Formatting ...
> 2012-08-13 00:59:21,787 INFO org.apache.hadoop.hdfs.server.**datanode.DataNode:
> Registered FSDatasetStatusMBean
> 2012-08-13 00:59:21,897 INFO org.apache.hadoop.hdfs.server.**datanode.**FSDatasetAsyncDiskService:
> Shutting down all async disk service threads...
> 2012-08-13 00:59:21,897 INFO org.apache.hadoop.hdfs.server.**datanode.**FSDatasetAsyncDiskService:
> All async disk service threads have been shut down.
> 2012-08-13 00:59:21,898 ERROR org.apache.hadoop.hdfs.server.**datanode.DataNode:
> java.net.BindException: Problem binding to /0.0.0.0:50010 : Address
> already in use
>     at org.apache.hadoop.ipc.Server.**bind(Server.java:227)
>     at org.apache.hadoop.hdfs.server.**datanode.DataNode.**
> startDataNode(DataNode.java:**404)
>     at org.apache.hadoop.hdfs.server.**datanode.DataNode.<init>(**
> DataNode.java:299)
>     at org.apache.hadoop.hdfs.server.**datanode.DataNode.**
> makeInstance(DataNode.java:**1582)
>     at org.apache.hadoop.hdfs.server.**datanode.DataNode.**
> instantiateDataNode(DataNode.**java:1521)
>     at org.apache.hadoop.hdfs.server.**datanode.DataNode.**
> createDataNode(DataNode.java:**1539)
>     at org.apache.hadoop.hdfs.server.**datanode.DataNode.secureMain(**
> DataNode.java:1665)
>     at org.apache.hadoop.hdfs.server.**datanode.DataNode.main(**
> DataNode.java:1682)
> Caused by: java.net.BindException: Address already in use
>     at sun.nio.ch.Net.bind(Native Method)
>     at sun.nio.ch.**ServerSocketChannelImpl.bind(**
> ServerSocketChannelImpl.java:**119)
>     at sun.nio.ch.**ServerSocketAdaptor.bind(**
> ServerSocketAdaptor.java:59)
>     at org.apache.hadoop.ipc.Server.**bind(Server.java:225)
>     ... 7 more
>
> 2012-08-13 00:59:21,899 INFO org.apache.hadoop.hdfs.server.**datanode.DataNode:
> SHUTDOWN_MSG:
> /****************************************************************
> SHUTDOWN_MSG: Shutting down DataNode at its-cs133.its.uni-kassel.de/**
> 141.51.205.43 <http://its-cs133.its.uni-kassel.de/141.51.205.43>
> ****************************************************************/
>
>
>
>
>
> \_____TastTracker
> With TaskTrackers it is the same: there are 2 kinds.
> ##############################**# LOG TYPE 1
> ##############################**##############################
> 2012-08-13 02:09:54,645 INFO org.apache.hadoop.mapred.**TaskTracker:
> Resending 'status' to 'its-cs131' with reponseId '879
> 2012-08-13 02:09:55,646 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: its-cs131/141.51.205.41:35555. Already tried 0 time(s).
> 2012-08-13 02:09:56,646 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: its-cs131/141.51.205.41:35555. Already tried 1 time(s).
> 2012-08-13 02:09:57,647 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: its-cs131/141.51.205.41:35555. Already tried 2 time(s).
> 2012-08-13 02:09:58,647 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: its-cs131/141.51.205.41:35555. Already tried 3 time(s).
> 2012-08-13 02:09:59,648 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: its-cs131/141.51.205.41:35555. Already tried 4 time(s).
> 2012-08-13 02:10:00,648 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: its-cs131/141.51.205.41:35555. Already tried 5 time(s).
> 2012-08-13 02:10:01,649 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: its-cs131/141.51.205.41:35555. Already tried 6 time(s).
> 2012-08-13 02:10:02,649 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: its-cs131/141.51.205.41:35555. Already tried 7 time(s).
> 2012-08-13 02:10:03,650 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: its-cs131/141.51.205.41:35555. Already tried 8 time(s).
> 2012-08-13 02:10:04,650 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: its-cs131/141.51.205.41:35555. Already tried 9 time(s).
> 2012-08-13 02:10:04,651 ERROR org.apache.hadoop.mapred.**TaskTracker:
> Caught exception: java.net.ConnectException: Call to its-cs131/
> 141.51.205.41:35555 failed on connection exception:
> java.net.ConnectException: Connection refused
>     at org.apache.hadoop.ipc.Client.**wrapException(Client.java:**1095)
>     at org.apache.hadoop.ipc.Client.**call(Client.java:1071)
>     at org.apache.hadoop.ipc.RPC$**Invoker.invoke(RPC.java:225)
>     at org.apache.hadoop.mapred.$**Proxy5.heartbeat(Unknown Source)
>     at org.apache.hadoop.mapred.**TaskTracker.transmitHeartBeat(**
> TaskTracker.java:1857)
>     at org.apache.hadoop.mapred.**TaskTracker.offerService(**
> TaskTracker.java:1653)
>     at org.apache.hadoop.mapred.**TaskTracker.run(TaskTracker.**java:2503)
>     at org.apache.hadoop.mapred.**TaskTracker.main(TaskTracker.**
> java:3744)
> Caused by: java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.**checkConnect(Native Method)
>     at sun.nio.ch.SocketChannelImpl.**finishConnect(**
> SocketChannelImpl.java:574)
>     at org.apache.hadoop.net.**SocketIOWithTimeout.connect(**
> SocketIOWithTimeout.java:206)
>     at org.apache.hadoop.net.**NetUtils.connect(NetUtils.**java:489)
>     at org.apache.hadoop.ipc.Client$**Connection.setupConnection(**
> Client.java:434)
>     at org.apache.hadoop.ipc.Client$**Connection.setupIOstreams(**
> Client.java:560)
>     at org.apache.hadoop.ipc.Client$**Connection.access$2000(Client.**
> java:184)
>     at org.apache.hadoop.ipc.Client.**getConnection(Client.java:**1202)
>     at org.apache.hadoop.ipc.Client.**call(Client.java:1046)
>     ... 6 more
>
>
> ########################### LOG TYPE 2 ##############################**
> ##############################
> 2012-08-13 00:59:24,376 INFO org.apache.hadoop.mapred.**TaskTracker:
> STARTUP_MSG:
> /****************************************************************
> STARTUP_MSG: Starting TaskTracker
> STARTUP_MSG:   host = its-cs133.its.uni-kassel.de/**141.51.205.43<http://its-cs133.its.uni-kassel.de/141.51.205.43>
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 1.0.2
> STARTUP_MSG:   build = https://svn.apache.org/repos/**
> asf/hadoop/common/branches/**branch-1.0.2<https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2>-r
1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012
> ****************************************************************/
> 2012-08-13 00:59:24,569 INFO org.apache.hadoop.metrics2.**impl.MetricsConfig:
> loaded properties from hadoop-metrics2.properties
> 2012-08-13 00:59:24,626 INFO org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter:
> MBean for source MetricsSystem,sub=Stats registered.
> 2012-08-13 00:59:24,627 INFO org.apache.hadoop.metrics2.**impl.MetricsSystemImpl:
> Scheduled snapshot period at 10 second(s).
> 2012-08-13 00:59:24,627 INFO org.apache.hadoop.metrics2.**impl.MetricsSystemImpl:
> TaskTracker metrics system started
> 2012-08-13 00:59:24,950 INFO org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter:
> MBean for source ugi registered.
> 2012-08-13 00:59:25,146 INFO org.mortbay.log: Logging to org.slf4j.impl.**
> Log4jLoggerAdapter(org.**mortbay.log) via org.mortbay.log.Slf4jLog
> 2012-08-13 00:59:25,206 INFO org.apache.hadoop.http.**HttpServer: Added
> global filtersafety (class=org.apache.hadoop.http.**
> HttpServer$QuotingInputFilter)
> 2012-08-13 00:59:25,232 INFO org.apache.hadoop.mapred.**TaskLogsTruncater:
> Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2012-08-13 00:59:25,237 INFO org.apache.hadoop.mapred.**TaskTracker:
> Starting tasktracker with owner as bmacek
> 2012-08-13 00:59:25,239 INFO org.apache.hadoop.mapred.**TaskTracker: Good
> mapred local directories are: /home/work/bmacek/hadoop/hdfs/**
> tmp/mapred/local
> 2012-08-13 00:59:25,244 INFO org.apache.hadoop.util.**NativeCodeLoader:
> Loaded the native-hadoop library
> 2012-08-13 00:59:25,255 INFO org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter:
> MBean for source jvm registered.
> 2012-08-13 00:59:25,256 INFO org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter:
> MBean for source TaskTrackerMetrics registered.
> 2012-08-13 00:59:25,279 INFO org.apache.hadoop.ipc.Server: Starting
> SocketReader
> 2012-08-13 00:59:25,282 INFO org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter:
> MBean for source RpcDetailedActivityForPort5485**0 registered.
> 2012-08-13 00:59:25,282 INFO org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter:
> MBean for source RpcActivityForPort54850 registered.
> 2012-08-13 00:59:25,287 INFO org.apache.hadoop.ipc.Server: IPC Server
> Responder: starting
> 2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC Server
> listener on 54850: starting
> 2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 54850: starting
> 2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 1 on 54850: starting
> 2012-08-13 00:59:25,289 INFO org.apache.hadoop.mapred.**TaskTracker:
> TaskTracker up at: localhost/127.0.0.1:54850
> 2012-08-13 00:59:25,289 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 3 on 54850: starting
> 2012-08-13 00:59:25,289 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 2 on 54850: starting
> 2012-08-13 00:59:25,289 INFO org.apache.hadoop.mapred.**TaskTracker:
> Starting tracker tracker_its-cs133.its.uni-**kassel.de:localhost/
> 127.0.0.1:**54850 <http://127.0.0.1:54850>
> 2012-08-13 00:59:26,321 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: its-cs131/141.51.205.41:35555. Already tried 0 time(s).
> 2012-08-13 00:59:38,104 INFO org.apache.hadoop.mapred.**TaskTracker:
> Starting thread: Map-events fetcher for all reduce tasks on
> tracker_its-cs133.its.uni-**kassel.de:localhost/127.0.0.1:**54850<http://127.0.0.1:54850>
> 2012-08-13 00:59:38,120 INFO org.apache.hadoop.util.**ProcessTree: setsid
> exited with exit code 0
> 2012-08-13 00:59:38,134 INFO org.apache.hadoop.mapred.**TaskTracker:
> Using ResourceCalculatorPlugin : org.apache.hadoop.util.**
> LinuxResourceCalculatorPlugin@**445e228
> 2012-08-13 00:59:38,137 WARN org.apache.hadoop.mapred.**TaskTracker:
> TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is
> disabled.
> 2012-08-13 00:59:38,145 INFO org.apache.hadoop.mapred.**IndexCache:
> IndexCache created with max memory = 10485760
> 2012-08-13 00:59:38,158 INFO org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter:
> MBean for source ShuffleServerMetrics registered.
> 2012-08-13 00:59:38,161 INFO org.apache.hadoop.http.**HttpServer: Port
> returned by webServer.getConnectors()[0].**getLocalPort() before open()
> is -1. Opening the listener on 50060
> 2012-08-13 00:59:38,161 ERROR org.apache.hadoop.mapred.**TaskTracker: Can
> not start task tracker because java.net.BindException: Address already in
> use
>     at sun.nio.ch.Net.bind(Native Method)
>     at sun.nio.ch.**ServerSocketChannelImpl.bind(**
> ServerSocketChannelImpl.java:**119)
>     at sun.nio.ch.**ServerSocketAdaptor.bind(**
> ServerSocketAdaptor.java:59)
>     at org.mortbay.jetty.nio.**SelectChannelConnector.open(**
> SelectChannelConnector.java:**216)
>     at org.apache.hadoop.http.**HttpServer.start(HttpServer.**java:581)
>     at org.apache.hadoop.mapred.**TaskTracker.<init>(**
> TaskTracker.java:1502)
>     at org.apache.hadoop.mapred.**TaskTracker.main(TaskTracker.**
> java:3742)
>
> 2012-08-13 00:59:38,163 INFO org.apache.hadoop.mapred.**TaskTracker:
> SHUTDOWN_MSG:
> /****************************************************************
> SHUTDOWN_MSG: Shutting down TaskTracker at its-cs133.its.uni-kassel.de/**
> 141.51.205.43 <http://its-cs133.its.uni-kassel.de/141.51.205.43>
> ****************************************************************/
>

Mime
View raw message