hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammad Tariq <donta...@gmail.com>
Subject Re: DataNode and Tasttracker communication
Date Mon, 13 Aug 2012 14:12:33 GMT
Hi Michael,
       I asked for hosts file because there seems to be some loopback prob
to me. The log shows that call is going at 0.0.0.0. Apart from what you
have said, I think disabling IPv6 and making sure that there is no prob
with the DNS resolution is also necessary. Please correct me if I am wrong.
Thank you.

Regards,
    Mohammad Tariq



On Mon, Aug 13, 2012 at 7:06 PM, Michael Segel <michael_segel@hotmail.com>wrote:

> Based on your /etc/hosts output, why aren't you using DNS?
>
> Outside of MapR, multihomed machines can be problematic. Hadoop doesn't
> generally work well when you're not using the FQDN or its alias.
>
> The issue isn't the SSH, but if you go to the node which is having trouble
> connecting to another node,  then try to ping it, or some other general
> communication,  if it succeeds, your issue is that the port you're trying
> to communicate with is blocked.  Then its more than likely an ipconfig or
> firewall issue.
>
> On Aug 13, 2012, at 8:17 AM, Björn-Elmar Macek <ema@cs.uni-kassel.de>
> wrote:
>
>  Hi Michael,
>
> well i can ssh from any node to any other without being prompted. The
> reason for this is, that my home dir is mounted in every server in the
> cluster.
>
> If the machines are multihomed: i dont know. i could ask if this would be
> of importance.
>
> Shall i?
>
> Regards,
> Elmar
>
> Am 13.08.12 14:59, schrieb Michael Segel:
>
> If the nodes can communicate and distribute data, then the odds are that
> the issue isn't going to be in his /etc/hosts.
>
>  A more relevant question is if he's running a firewall on each of these
> machines?
>
>  A simple test... ssh to one node, ping other nodes and the control nodes
> at random to see if they can see one another. Then check to see if there is
> a firewall running which would limit the types of traffic between nodes.
>
>  One other side note... are these machines multi-homed?
>
>   On Aug 13, 2012, at 7:51 AM, Mohammad Tariq <dontariq@gmail.com> wrote:
>
> Hello there,
>
>       Could you please share your /etc/hosts file, if you don't mind.
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Mon, Aug 13, 2012 at 6:01 PM, Björn-Elmar Macek <macek@cs.uni-kassel.de
> > wrote:
>
>> Hi,
>>
>> i am currently trying to run my hadoop program on a cluster. Sadly though
>> my datanodes and tasktrackers seem to have difficulties with their
>> communication as their logs say:
>> * Some datanodes and tasktrackers seem to have portproblems of some kind
>> as it can be seen in the logs below. I wondered if this might be due to
>> reasons correllated with the localhost entry in /etc/hosts as you can read
>> in alot of posts with similar errors, but i checked the file neither
>> localhost nor 127.0.0.1/127.0.1.1 is bound there. (although you can ping
>> localhost... the technician of the cluster said he'd be looking for the
>> mechanics resolving localhost)
>> * The other nodes can not speak with the namenode and jobtracker
>> (its-cs131). Although it is absolutely not clear, why this is happening:
>> the "dfs -put" i do directly before the job is running fine, which seems to
>> imply that communication between those servers is working flawlessly.
>>
>> Is there any reason why this might happen?
>>
>>
>> Regards,
>> Elmar
>>
>> LOGS BELOW:
>>
>> \____Datanodes
>>
>> After successfully putting the data to hdfs (at this point i thought
>> namenode and datanodes have to communicate), i get the following errors
>> when starting the job:
>>
>> There are 2 kinds of logs i found: the first one is big (about 12MB) and
>> looks like this:
>> ############################### LOG TYPE 1
>> ############################################################
>> 2012-08-13 08:23:27,331 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35554. Already tried 0
>> time(s).
>> 2012-08-13 08:23:28,332 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35554. Already tried 1
>> time(s).
>> 2012-08-13 08:23:29,332 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35554. Already tried 2
>> time(s).
>> 2012-08-13 08:23:30,332 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35554. Already tried 3
>> time(s).
>> 2012-08-13 08:23:31,333 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35554. Already tried 4
>> time(s).
>> 2012-08-13 08:23:32,333 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35554. Already tried 5
>> time(s).
>> 2012-08-13 08:23:33,334 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35554. Already tried 6
>> time(s).
>> 2012-08-13 08:23:34,334 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35554. Already tried 7
>> time(s).
>> 2012-08-13 08:23:35,334 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35554. Already tried 8
>> time(s).
>> 2012-08-13 08:23:36,335 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35554. Already tried 9
>> time(s).
>> 2012-08-13 08:23:36,335 WARN
>> org.apache.hadoop.hdfs.server.datanode.DataNode: java.net.ConnectException:
>> Call to its-cs131/141.51.205.41:35554 failed on connection exception:
>> java.net.ConnectException: Connection refused
>>     at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
>>     at org.apache.hadoop.ipc.Client.call(Client.java:1071)
>>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>>     at $Proxy5.sendHeartbeat(Unknown Source)
>>     at
>> org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:904)
>>     at
>> org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1458)
>>     at java.lang.Thread.run(Thread.java:619)
>> Caused by: java.net.ConnectException: Connection refused
>>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>     at
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>>     at org.apache.hadoop.net
>> .SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>>     at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
>>     at
>> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
>>     at
>> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
>>     at
>> org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
>>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
>>     at org.apache.hadoop.ipc.Client.call(Client.java:1046)
>>     ... 5 more
>>
>> ... (this continues til the end of the log)
>>
>> The second is short kind:
>> ########################### LOG TYPE 2
>> ############################################################
>> 2012-08-13 00:59:19,038 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
>> /************************************************************
>> STARTUP_MSG: Starting DataNode
>> STARTUP_MSG:   host = its-cs133.its.uni-kassel.de/141.51.205.43
>> STARTUP_MSG:   args = []
>> STARTUP_MSG:   version = 1.0.2
>> STARTUP_MSG:   build =
>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2 -r
>> 1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012
>> ************************************************************/
>> 2012-08-13 00:59:19,203 INFO
>> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
>> hadoop-metrics2.properties
>> 2012-08-13 00:59:19,216 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>> MetricsSystem,sub=Stats registered.
>> 2012-08-13 00:59:19,217 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
>> period at 10 second(s).
>> 2012-08-13 00:59:19,218 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system
>> started
>> 2012-08-13 00:59:19,306 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi
>> registered.
>> 2012-08-13 00:59:19,346 INFO org.apache.hadoop.util.NativeCodeLoader:
>> Loaded the native-hadoop library
>> 2012-08-13 00:59:20,482 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35554. Already tried 0
>> time(s).
>> 2012-08-13 00:59:21,584 INFO
>> org.apache.hadoop.hdfs.server.common.Storage: Storage directory
>> /home/work/bmacek/hadoop/hdfs/slave is not formatted.
>> 2012-08-13 00:59:21,584 INFO
>> org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
>> 2012-08-13 00:59:21,787 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: Registered
>> FSDatasetStatusMBean
>> 2012-08-13 00:59:21,897 INFO
>> org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: Shutting
>> down all async disk service threads...
>> 2012-08-13 00:59:21,897 INFO
>> org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: All async
>> disk service threads have been shut down.
>> 2012-08-13 00:59:21,898 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode: java.net.BindException:
>> Problem binding to /0.0.0.0:50010 : Address already in use
>>     at org.apache.hadoop.ipc.Server.bind(Server.java:227)
>>     at
>> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:404)
>>     at
>> org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299)
>>     at
>> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582)
>>     at
>> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
>>     at
>> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)
>>     at
>> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665)
>>     at
>> org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682)
>> Caused by: java.net.BindException: Address already in use
>>     at sun.nio.ch.Net.bind(Native Method)
>>     at sun.nio.ch
>> .ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
>>     at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
>>     at org.apache.hadoop.ipc.Server.bind(Server.java:225)
>>     ... 7 more
>>
>> 2012-08-13 00:59:21,899 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
>> /************************************************************
>> SHUTDOWN_MSG: Shutting down DataNode at
>> its-cs133.its.uni-kassel.de/141.51.205.43
>> ************************************************************/
>>
>>
>>
>>
>>
>> \_____TastTracker
>> With TaskTrackers it is the same: there are 2 kinds.
>> ############################### LOG TYPE 1
>> ############################################################
>> 2012-08-13 02:09:54,645 INFO org.apache.hadoop.mapred.TaskTracker:
>> Resending 'status' to 'its-cs131' with reponseId '879
>> 2012-08-13 02:09:55,646 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35555. Already tried 0
>> time(s).
>> 2012-08-13 02:09:56,646 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35555. Already tried 1
>> time(s).
>> 2012-08-13 02:09:57,647 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35555. Already tried 2
>> time(s).
>> 2012-08-13 02:09:58,647 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35555. Already tried 3
>> time(s).
>> 2012-08-13 02:09:59,648 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35555. Already tried 4
>> time(s).
>> 2012-08-13 02:10:00,648 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35555. Already tried 5
>> time(s).
>> 2012-08-13 02:10:01,649 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35555. Already tried 6
>> time(s).
>> 2012-08-13 02:10:02,649 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35555. Already tried 7
>> time(s).
>> 2012-08-13 02:10:03,650 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35555. Already tried 8
>> time(s).
>> 2012-08-13 02:10:04,650 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35555. Already tried 9
>> time(s).
>> 2012-08-13 02:10:04,651 ERROR org.apache.hadoop.mapred.TaskTracker:
>> Caught exception: java.net.ConnectException: Call to its-cs131/
>> 141.51.205.41:35555 failed on connection exception:
>> java.net.ConnectException: Connection refused
>>     at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
>>     at org.apache.hadoop.ipc.Client.call(Client.java:1071)
>>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>>     at org.apache.hadoop.mapred.$Proxy5.heartbeat(Unknown Source)
>>     at
>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1857)
>>     at
>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1653)
>>     at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2503)
>>     at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3744)
>> Caused by: java.net.ConnectException: Connection refused
>>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>     at
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>>     at org.apache.hadoop.net
>> .SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>>     at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
>>     at
>> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
>>     at
>> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
>>     at
>> org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
>>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
>>     at org.apache.hadoop.ipc.Client.call(Client.java:1046)
>>     ... 6 more
>>
>>
>> ########################### LOG TYPE 2
>> ############################################################
>> 2012-08-13 00:59:24,376 INFO org.apache.hadoop.mapred.TaskTracker:
>> STARTUP_MSG:
>> /************************************************************
>> STARTUP_MSG: Starting TaskTracker
>> STARTUP_MSG:   host = its-cs133.its.uni-kassel.de/141.51.205.43
>> STARTUP_MSG:   args = []
>> STARTUP_MSG:   version = 1.0.2
>> STARTUP_MSG:   build =
>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2 -r
>> 1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012
>> ************************************************************/
>> 2012-08-13 00:59:24,569 INFO
>> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
>> hadoop-metrics2.properties
>> 2012-08-13 00:59:24,626 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>> MetricsSystem,sub=Stats registered.
>> 2012-08-13 00:59:24,627 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
>> period at 10 second(s).
>> 2012-08-13 00:59:24,627 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: TaskTracker metrics
>> system started
>> 2012-08-13 00:59:24,950 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi
>> registered.
>> 2012-08-13 00:59:25,146 INFO org.mortbay.log: Logging to
>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>> org.mortbay.log.Slf4jLog
>> 2012-08-13 00:59:25,206 INFO org.apache.hadoop.http.HttpServer: Added
>> global filtersafety
>> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
>> 2012-08-13 00:59:25,232 INFO org.apache.hadoop.mapred.TaskLogsTruncater:
>> Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
>> 2012-08-13 00:59:25,237 INFO org.apache.hadoop.mapred.TaskTracker:
>> Starting tasktracker with owner as bmacek
>> 2012-08-13 00:59:25,239 INFO org.apache.hadoop.mapred.TaskTracker: Good
>> mapred local directories are: /home/work/bmacek/hadoop/hdfs/tmp/mapred/local
>> 2012-08-13 00:59:25,244 INFO org.apache.hadoop.util.NativeCodeLoader:
>> Loaded the native-hadoop library
>> 2012-08-13 00:59:25,255 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm
>> registered.
>> 2012-08-13 00:59:25,256 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>> TaskTrackerMetrics registered.
>> 2012-08-13 00:59:25,279 INFO org.apache.hadoop.ipc.Server: Starting
>> SocketReader
>> 2012-08-13 00:59:25,282 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>> RpcDetailedActivityForPort54850 registered.
>> 2012-08-13 00:59:25,282 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>> RpcActivityForPort54850 registered.
>> 2012-08-13 00:59:25,287 INFO org.apache.hadoop.ipc.Server: IPC Server
>> Responder: starting
>> 2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC Server
>> listener on 54850: starting
>> 2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 0 on 54850: starting
>> 2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 1 on 54850: starting
>> 2012-08-13 00:59:25,289 INFO org.apache.hadoop.mapred.TaskTracker:
>> TaskTracker up at: localhost/127.0.0.1:54850
>> 2012-08-13 00:59:25,289 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 3 on 54850: starting
>> 2012-08-13 00:59:25,289 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 2 on 54850: starting
>> 2012-08-13 00:59:25,289 INFO org.apache.hadoop.mapred.TaskTracker:
>> Starting tracker tracker_its-cs133.its.uni-kassel.de:localhost/
>> 127.0.0.1:54850
>> 2012-08-13 00:59:26,321 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35555. Already tried 0
>> time(s).
>> 2012-08-13 00:59:38,104 INFO org.apache.hadoop.mapred.TaskTracker:
>> Starting thread: Map-events fetcher for all reduce tasks on
>> tracker_its-cs133.its.uni-kassel.de:localhost/127.0.0.1:54850
>> 2012-08-13 00:59:38,120 INFO org.apache.hadoop.util.ProcessTree: setsid
>> exited with exit code 0
>> 2012-08-13 00:59:38,134 INFO org.apache.hadoop.mapred.TaskTracker: Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@445e228
>> 2012-08-13 00:59:38,137 WARN org.apache.hadoop.mapred.TaskTracker:
>> TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is
>> disabled.
>> 2012-08-13 00:59:38,145 INFO org.apache.hadoop.mapred.IndexCache:
>> IndexCache created with max memory = 10485760
>> 2012-08-13 00:59:38,158 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>> ShuffleServerMetrics registered.
>> 2012-08-13 00:59:38,161 INFO org.apache.hadoop.http.HttpServer: Port
>> returned by webServer.getConnectors()[0].getLocalPort() before open() is
>> -1. Opening the listener on 50060
>> 2012-08-13 00:59:38,161 ERROR org.apache.hadoop.mapred.TaskTracker: Can
>> not start task tracker because java.net.BindException: Address already in
>> use
>>     at sun.nio.ch.Net.bind(Native Method)
>>     at sun.nio.ch
>> .ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
>>     at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
>>     at
>> org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
>>     at org.apache.hadoop.http.HttpServer.start(HttpServer.java:581)
>>     at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1502)
>>     at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3742)
>>
>> 2012-08-13 00:59:38,163 INFO org.apache.hadoop.mapred.TaskTracker:
>> SHUTDOWN_MSG:
>> /************************************************************
>> SHUTDOWN_MSG: Shutting down TaskTracker at
>> its-cs133.its.uni-kassel.de/141.51.205.43
>> ************************************************************/
>>
>
>
>
>
>

Mime
View raw message