Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 944D4DB41 for ; Mon, 13 Aug 2012 14:58:36 +0000 (UTC) Received: (qmail 75843 invoked by uid 500); 13 Aug 2012 14:58:29 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 75744 invoked by uid 500); 13 Aug 2012 14:58:29 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 75737 invoked by uid 99); 13 Aug 2012 14:58:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Aug 2012 14:58:29 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,NORMAL_HTTP_TO_IP,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [141.51.167.101] (HELO gundel.cs.uni-kassel.de) (141.51.167.101) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Aug 2012 14:58:24 +0000 Received: from localhost (localhost [127.0.0.1]) by gundel.cs.uni-kassel.de (Postfix) with ESMTP id 483E51A447C for ; Mon, 13 Aug 2012 16:58:01 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at gundel.cs.uni-kassel.de Received: from gundel.cs.uni-kassel.de ([127.0.0.1]) by localhost (gundel.cs.uni-kassel.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GLK+N1PJ7lhx for ; Mon, 13 Aug 2012 16:57:51 +0200 (CEST) Received: from Bjorn-Elmars-MacBook-Air.local (kde-pool13.cs.uni-kassel.de [141.51.167.146]) by gundel.cs.uni-kassel.de (Postfix) with ESMTPSA id C6C901A443D for ; Mon, 13 Aug 2012 16:57:51 +0200 (CEST) Message-ID: <502915F0.8020906@cs.uni-kassel.de> Date: Mon, 13 Aug 2012 16:57:52 +0200 From: =?ISO-8859-1?Q?Bj=F6rn-Elmar_Macek?= User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:14.0) Gecko/20120713 Thunderbird/14.0 MIME-Version: 1.0 To: user@hadoop.apache.org Subject: Re: DataNode and Tasttracker communication References: <5028F38A.5070301@cs.uni-kassel.de> <5028FE76.1010703@cs.uni-kassel.de> In-Reply-To: Content-Type: multipart/alternative; boundary="------------080801060007000305090905" X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. --------------080801060007000305090905 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Hi, with "using DNS" you mean using the servers' non-IP-names, right? If so, i do use DNS. Since i am working in a SLURM enviroment and i get a list of nodes for evry job i schedule, i construct the config files for evry job by taking the list of assigned nodes and deviding the roles(NameNode,JobTracker,SecondaryNameNode,TaskTrackers,DataNodes) over this set of machines. SLURM offers me names like "its-cs" which is enough for ssh to connect - maybe it isnt for all hadoop processes. The complete names would be "its-cs.its.uni-kassel.de". I will add this part of the adress for testing. But i fear it wont help alot, cause the JobTracker's log seems to know the full names: ### 2012-08-13 01:12:02,770 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201208130059_0001_m_000887 has split on node:/default-rack/its-cs202.its.uni-kassel.de 2012-08-13 01:12:02,770 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201208130059_0001_m_000888 has split on node:/default-rack/its-cs202.its.uni-kassel.de 2012-08-13 01:12:02,770 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201208130059_0001_m_000889 has split on node:/default-rack/its-cs195.its.uni-kassel.de 2012-08-13 01:12:02,770 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201208130059_0001_m_000890 has split on node:/default-rack/its-cs196.its.uni-kassel.de 2012-08-13 01:12:02,770 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201208130059_0001_m_000891 has split on node:/default-rack/its-cs201.its.uni-kassel.de ### Pings work btw: i could ping the NameNode from all problematic nodes. And lsof -i didnt yield and other programs running on the NameNode/JobTracker node with the problematic ports. :( Maybe something to notice is, that after the NameNode/JobTracker server is atm not running anymore although the DataNode/TaskTracker logs are still growing. Concerning IPv6: as far as i can see i would have to modify global config files to dsiable it. Since i am only a user of this cluster with very limited insight in why the machines are configured the way they are, i want to be very careful with asking the technicians to make changes to their setup. I dont want to be respectless. I will try using the full names first and if this doesnt help, i will ofc ask them if no other options are available. Am 13.08.12 16:12, schrieb Mohammad Tariq: > Hi Michael, > I asked for hosts file because there seems to be some loopback > prob to me. The log shows that call is going at 0.0.0.0. Apart from > what you have said, I think disabling IPv6 and making sure that there > is no prob with the DNS resolution is also necessary. Please correct > me if I am wrong. Thank you. > > Regards, > Mohammad Tariq > > > > On Mon, Aug 13, 2012 at 7:06 PM, Michael Segel > > wrote: > > Based on your /etc/hosts output, why aren't you using DNS? > > Outside of MapR, multihomed machines can be problematic. Hadoop > doesn't generally work well when you're not using the FQDN or its > alias. > > The issue isn't the SSH, but if you go to the node which is having > trouble connecting to another node, then try to ping it, or some > other general communication, if it succeeds, your issue is that > the port you're trying to communicate with is blocked. Then its > more than likely an ipconfig or firewall issue. > > On Aug 13, 2012, at 8:17 AM, Bj�rn-Elmar Macek > > wrote: > >> Hi Michael, >> >> well i can ssh from any node to any other without being prompted. >> The reason for this is, that my home dir is mounted in every >> server in the cluster. >> >> If the machines are multihomed: i dont know. i could ask if this >> would be of importance. >> >> Shall i? >> >> Regards, >> Elmar >> >> Am 13.08.12 14:59, schrieb Michael Segel: >>> If the nodes can communicate and distribute data, then the odds >>> are that the issue isn't going to be in his /etc/hosts. >>> >>> A more relevant question is if he's running a firewall on each >>> of these machines? >>> >>> A simple test... ssh to one node, ping other nodes and the >>> control nodes at random to see if they can see one another. Then >>> check to see if there is a firewall running which would limit >>> the types of traffic between nodes. >>> >>> One other side note... are these machines multi-homed? >>> >>> On Aug 13, 2012, at 7:51 AM, Mohammad Tariq >> > wrote: >>> >>>> Hello there, >>>> >>>> Could you please share your /etc/hosts file, if you don't >>>> mind. >>>> >>>> Regards, >>>> Mohammad Tariq >>>> >>>> >>>> >>>> On Mon, Aug 13, 2012 at 6:01 PM, Bj�rn-Elmar Macek >>>> > wrote: >>>> >>>> Hi, >>>> >>>> i am currently trying to run my hadoop program on a >>>> cluster. Sadly though my datanodes and tasktrackers seem to >>>> have difficulties with their communication as their logs say: >>>> * Some datanodes and tasktrackers seem to have portproblems >>>> of some kind as it can be seen in the logs below. I >>>> wondered if this might be due to reasons correllated with >>>> the localhost entry in /etc/hosts as you can read in alot >>>> of posts with similar errors, but i checked the file >>>> neither localhost nor 127.0.0.1/127.0.1.1 >>>> is bound there. (although you >>>> can ping localhost... the technician of the cluster said >>>> he'd be looking for the mechanics resolving localhost) >>>> * The other nodes can not speak with the namenode and >>>> jobtracker (its-cs131). Although it is absolutely not >>>> clear, why this is happening: the "dfs -put" i do directly >>>> before the job is running fine, which seems to imply that >>>> communication between those servers is working flawlessly. >>>> >>>> Is there any reason why this might happen? >>>> >>>> >>>> Regards, >>>> Elmar >>>> >>>> LOGS BELOW: >>>> >>>> \____Datanodes >>>> >>>> After successfully putting the data to hdfs (at this point >>>> i thought namenode and datanodes have to communicate), i >>>> get the following errors when starting the job: >>>> >>>> There are 2 kinds of logs i found: the first one is big >>>> (about 12MB) and looks like this: >>>> ############################### LOG TYPE 1 >>>> ############################################################ >>>> 2012-08-13 08:23:27,331 INFO org.apache.hadoop.ipc.Client: >>>> Retrying connect to server: its-cs131/141.51.205.41:35554 >>>> . Already tried 0 time(s). >>>> 2012-08-13 08:23:28,332 INFO org.apache.hadoop.ipc.Client: >>>> Retrying connect to server: its-cs131/141.51.205.41:35554 >>>> . Already tried 1 time(s). >>>> 2012-08-13 08:23:29,332 INFO org.apache.hadoop.ipc.Client: >>>> Retrying connect to server: its-cs131/141.51.205.41:35554 >>>> . Already tried 2 time(s). >>>> 2012-08-13 08:23:30,332 INFO org.apache.hadoop.ipc.Client: >>>> Retrying connect to server: its-cs131/141.51.205.41:35554 >>>> . Already tried 3 time(s). >>>> 2012-08-13 08:23:31,333 INFO org.apache.hadoop.ipc.Client: >>>> Retrying connect to server: its-cs131/141.51.205.41:35554 >>>> . Already tried 4 time(s). >>>> 2012-08-13 08:23:32,333 INFO org.apache.hadoop.ipc.Client: >>>> Retrying connect to server: its-cs131/141.51.205.41:35554 >>>> . Already tried 5 time(s). >>>> 2012-08-13 08:23:33,334 INFO org.apache.hadoop.ipc.Client: >>>> Retrying connect to server: its-cs131/141.51.205.41:35554 >>>> . Already tried 6 time(s). >>>> 2012-08-13 08:23:34,334 INFO org.apache.hadoop.ipc.Client: >>>> Retrying connect to server: its-cs131/141.51.205.41:35554 >>>> . Already tried 7 time(s). >>>> 2012-08-13 08:23:35,334 INFO org.apache.hadoop.ipc.Client: >>>> Retrying connect to server: its-cs131/141.51.205.41:35554 >>>> . Already tried 8 time(s). >>>> 2012-08-13 08:23:36,335 INFO org.apache.hadoop.ipc.Client: >>>> Retrying connect to server: its-cs131/141.51.205.41:35554 >>>> . Already tried 9 time(s). >>>> 2012-08-13 08:23:36,335 WARN >>>> org.apache.hadoop.hdfs.server.datanode.DataNode: >>>> java.net.ConnectException: Call to >>>> its-cs131/141.51.205.41:35554 >>>> failed on connection exception: java.net.ConnectException: >>>> Connection refused >>>> at >>>> org.apache.hadoop.ipc.Client.wrapException(Client.java:1095) >>>> at org.apache.hadoop.ipc.Client.call(Client.java:1071) >>>> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) >>>> at $Proxy5.sendHeartbeat(Unknown Source) >>>> at >>>> org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:904) >>>> at >>>> org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1458) >>>> at java.lang.Thread.run(Thread.java:619) >>>> Caused by: java.net.ConnectException: Connection refused >>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>> at >>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) >>>> at org.apache.hadoop.net >>>> .SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) >>>> at org.apache.hadoop.net >>>> .NetUtils.connect(NetUtils.java:489) >>>> at >>>> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434) >>>> at >>>> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560) >>>> at >>>> org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184) >>>> at >>>> org.apache.hadoop.ipc.Client.getConnection(Client.java:1202) >>>> at org.apache.hadoop.ipc.Client.call(Client.java:1046) >>>> ... 5 more >>>> >>>> ... (this continues til the end of the log) >>>> >>>> The second is short kind: >>>> ########################### LOG TYPE 2 >>>> ############################################################ >>>> 2012-08-13 00:59:19,038 INFO >>>> org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: >>>> /************************************************************ >>>> STARTUP_MSG: Starting DataNode >>>> STARTUP_MSG: host = >>>> its-cs133.its.uni-kassel.de/141.51.205.43 >>>> >>>> STARTUP_MSG: args = [] >>>> STARTUP_MSG: version = 1.0.2 >>>> STARTUP_MSG: build = >>>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2 >>>> -r 1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 >>>> UTC 2012 >>>> ************************************************************/ >>>> 2012-08-13 00:59:19,203 INFO >>>> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded >>>> properties from hadoop-metrics2.properties >>>> 2012-08-13 00:59:19,216 INFO >>>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean >>>> for source MetricsSystem,sub=Stats registered. >>>> 2012-08-13 00:59:19,217 INFO >>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: >>>> Scheduled snapshot period at 10 second(s). >>>> 2012-08-13 00:59:19,218 INFO >>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode >>>> metrics system started >>>> 2012-08-13 00:59:19,306 INFO >>>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean >>>> for source ugi registered. >>>> 2012-08-13 00:59:19,346 INFO >>>> org.apache.hadoop.util.NativeCodeLoader: Loaded the >>>> native-hadoop library >>>> 2012-08-13 00:59:20,482 INFO org.apache.hadoop.ipc.Client: >>>> Retrying connect to server: its-cs131/141.51.205.41:35554 >>>> . Already tried 0 time(s). >>>> 2012-08-13 00:59:21,584 INFO >>>> org.apache.hadoop.hdfs.server.common.Storage: Storage >>>> directory /home/work/bmacek/hadoop/hdfs/slave is not formatted. >>>> 2012-08-13 00:59:21,584 INFO >>>> org.apache.hadoop.hdfs.server.common.Storage: Formatting ... >>>> 2012-08-13 00:59:21,787 INFO >>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Registered >>>> FSDatasetStatusMBean >>>> 2012-08-13 00:59:21,897 INFO >>>> org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: >>>> Shutting down all async disk service threads... >>>> 2012-08-13 00:59:21,897 INFO >>>> org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: >>>> All async disk service threads have been shut down. >>>> 2012-08-13 00:59:21,898 ERROR >>>> org.apache.hadoop.hdfs.server.datanode.DataNode: >>>> java.net.BindException: Problem binding to /0.0.0.0:50010 >>>> : Address already in use >>>> at org.apache.hadoop.ipc.Server.bind(Server.java:227) >>>> at >>>> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:404) >>>> at >>>> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:299) >>>> at >>>> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582) >>>> at >>>> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521) >>>> at >>>> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539) >>>> at >>>> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665) >>>> at >>>> org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682) >>>> Caused by: java.net.BindException: Address already in use >>>> at sun.nio.ch.Net.bind(Native Method) >>>> at sun.nio.ch >>>> .ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119) >>>> at sun.nio.ch >>>> .ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) >>>> at org.apache.hadoop.ipc.Server.bind(Server.java:225) >>>> ... 7 more >>>> >>>> 2012-08-13 00:59:21,899 INFO >>>> org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: >>>> /************************************************************ >>>> SHUTDOWN_MSG: Shutting down DataNode at >>>> its-cs133.its.uni-kassel.de/141.51.205.43 >>>> >>>> ************************************************************/ >>>> >>>> >>>> >>>> >>>> >>>> \_____TastTracker >>>> With TaskTrackers it is the same: there are 2 kinds. >>>> ############################### LOG TYPE 1 >>>> ############################################################ >>>> 2012-08-13 02:09:54,645 INFO >>>> org.apache.hadoop.mapred.TaskTracker: Resending 'status' to >>>> 'its-cs131' with reponseId '879 >>>> 2012-08-13 02:09:55,646 INFO org.apache.hadoop.ipc.Client: >>>> Retrying connect to server: its-cs131/141.51.205.41:35555 >>>> . Already tried 0 time(s). >>>> 2012-08-13 02:09:56,646 INFO org.apache.hadoop.ipc.Client: >>>> Retrying connect to server: its-cs131/141.51.205.41:35555 >>>> . Already tried 1 time(s). >>>> 2012-08-13 02:09:57,647 INFO org.apache.hadoop.ipc.Client: >>>> Retrying connect to server: its-cs131/141.51.205.41:35555 >>>> . Already tried 2 time(s). >>>> 2012-08-13 02:09:58,647 INFO org.apache.hadoop.ipc.Client: >>>> Retrying connect to server: its-cs131/141.51.205.41:35555 >>>> . Already tried 3 time(s). >>>> 2012-08-13 02:09:59,648 INFO org.apache.hadoop.ipc.Client: >>>> Retrying connect to server: its-cs131/141.51.205.41:35555 >>>> . Already tried 4 time(s). >>>> 2012-08-13 02:10:00,648 INFO org.apache.hadoop.ipc.Client: >>>> Retrying connect to server: its-cs131/141.51.205.41:35555 >>>> . Already tried 5 time(s). >>>> 2012-08-13 02:10:01,649 INFO org.apache.hadoop.ipc.Client: >>>> Retrying connect to server: its-cs131/141.51.205.41:35555 >>>> . Already tried 6 time(s). >>>> 2012-08-13 02:10:02,649 INFO org.apache.hadoop.ipc.Client: >>>> Retrying connect to server: its-cs131/141.51.205.41:35555 >>>> . Already tried 7 time(s). >>>> 2012-08-13 02:10:03,650 INFO org.apache.hadoop.ipc.Client: >>>> Retrying connect to server: its-cs131/141.51.205.41:35555 >>>> . Already tried 8 time(s). >>>> 2012-08-13 02:10:04,650 INFO org.apache.hadoop.ipc.Client: >>>> Retrying connect to server: its-cs131/141.51.205.41:35555 >>>> . Already tried 9 time(s). >>>> 2012-08-13 02:10:04,651 ERROR >>>> org.apache.hadoop.mapred.TaskTracker: Caught exception: >>>> java.net.ConnectException: Call to >>>> its-cs131/141.51.205.41:35555 >>>> failed on connection exception: java.net.ConnectException: >>>> Connection refused >>>> at >>>> org.apache.hadoop.ipc.Client.wrapException(Client.java:1095) >>>> at org.apache.hadoop.ipc.Client.call(Client.java:1071) >>>> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) >>>> at org.apache.hadoop.mapred.$Proxy5.heartbeat(Unknown >>>> Source) >>>> at >>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1857) >>>> at >>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1653) >>>> at >>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2503) >>>> at >>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3744) >>>> Caused by: java.net.ConnectException: Connection refused >>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>> at >>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) >>>> at org.apache.hadoop.net >>>> .SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) >>>> at org.apache.hadoop.net >>>> .NetUtils.connect(NetUtils.java:489) >>>> at >>>> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434) >>>> at >>>> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560) >>>> at >>>> org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184) >>>> at >>>> org.apache.hadoop.ipc.Client.getConnection(Client.java:1202) >>>> at org.apache.hadoop.ipc.Client.call(Client.java:1046) >>>> ... 6 more >>>> >>>> >>>> ########################### LOG TYPE 2 >>>> ############################################################ >>>> 2012-08-13 00:59:24,376 INFO >>>> org.apache.hadoop.mapred.TaskTracker: STARTUP_MSG: >>>> /************************************************************ >>>> STARTUP_MSG: Starting TaskTracker >>>> STARTUP_MSG: host = >>>> its-cs133.its.uni-kassel.de/141.51.205.43 >>>> >>>> STARTUP_MSG: args = [] >>>> STARTUP_MSG: version = 1.0.2 >>>> STARTUP_MSG: build = >>>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2 >>>> -r 1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 >>>> UTC 2012 >>>> ************************************************************/ >>>> 2012-08-13 00:59:24,569 INFO >>>> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded >>>> properties from hadoop-metrics2.properties >>>> 2012-08-13 00:59:24,626 INFO >>>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean >>>> for source MetricsSystem,sub=Stats registered. >>>> 2012-08-13 00:59:24,627 INFO >>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: >>>> Scheduled snapshot period at 10 second(s). >>>> 2012-08-13 00:59:24,627 INFO >>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: >>>> TaskTracker metrics system started >>>> 2012-08-13 00:59:24,950 INFO >>>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean >>>> for source ugi registered. >>>> 2012-08-13 00:59:25,146 INFO org.mortbay.log: Logging to >>>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via >>>> org.mortbay.log.Slf4jLog >>>> 2012-08-13 00:59:25,206 INFO >>>> org.apache.hadoop.http.HttpServer: Added global >>>> filtersafety >>>> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter) >>>> 2012-08-13 00:59:25,232 INFO >>>> org.apache.hadoop.mapred.TaskLogsTruncater: Initializing >>>> logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 >>>> 2012-08-13 00:59:25,237 INFO >>>> org.apache.hadoop.mapred.TaskTracker: Starting tasktracker >>>> with owner as bmacek >>>> 2012-08-13 00:59:25,239 INFO >>>> org.apache.hadoop.mapred.TaskTracker: Good mapred local >>>> directories are: /home/work/bmacek/hadoop/hdfs/tmp/mapred/local >>>> 2012-08-13 00:59:25,244 INFO >>>> org.apache.hadoop.util.NativeCodeLoader: Loaded the >>>> native-hadoop library >>>> 2012-08-13 00:59:25,255 INFO >>>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean >>>> for source jvm registered. >>>> 2012-08-13 00:59:25,256 INFO >>>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean >>>> for source TaskTrackerMetrics registered. >>>> 2012-08-13 00:59:25,279 INFO org.apache.hadoop.ipc.Server: >>>> Starting SocketReader >>>> 2012-08-13 00:59:25,282 INFO >>>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean >>>> for source RpcDetailedActivityForPort54850 registered. >>>> 2012-08-13 00:59:25,282 INFO >>>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean >>>> for source RpcActivityForPort54850 registered. >>>> 2012-08-13 00:59:25,287 INFO org.apache.hadoop.ipc.Server: >>>> IPC Server Responder: starting >>>> 2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: >>>> IPC Server listener on 54850: starting >>>> 2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: >>>> IPC Server handler 0 on 54850: starting >>>> 2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: >>>> IPC Server handler 1 on 54850: starting >>>> 2012-08-13 00:59:25,289 INFO >>>> org.apache.hadoop.mapred.TaskTracker: TaskTracker up at: >>>> localhost/127.0.0.1:54850 >>>> 2012-08-13 00:59:25,289 INFO org.apache.hadoop.ipc.Server: >>>> IPC Server handler 3 on 54850: starting >>>> 2012-08-13 00:59:25,289 INFO org.apache.hadoop.ipc.Server: >>>> IPC Server handler 2 on 54850: starting >>>> 2012-08-13 00:59:25,289 INFO >>>> org.apache.hadoop.mapred.TaskTracker: Starting tracker >>>> tracker_its-cs133.its.uni-kassel.de:localhost/127.0.0.1:54850 >>>> >>>> 2012-08-13 00:59:26,321 INFO org.apache.hadoop.ipc.Client: >>>> Retrying connect to server: its-cs131/141.51.205.41:35555 >>>> . Already tried 0 time(s). >>>> 2012-08-13 00:59:38,104 INFO >>>> org.apache.hadoop.mapred.TaskTracker: Starting thread: >>>> Map-events fetcher for all reduce tasks on >>>> tracker_its-cs133.its.uni-kassel.de:localhost/127.0.0.1:54850 >>>> >>>> 2012-08-13 00:59:38,120 INFO >>>> org.apache.hadoop.util.ProcessTree: setsid exited with exit >>>> code 0 >>>> 2012-08-13 00:59:38,134 INFO >>>> org.apache.hadoop.mapred.TaskTracker: Using >>>> ResourceCalculatorPlugin : >>>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@445e228 >>>> 2012-08-13 00:59:38,137 WARN >>>> org.apache.hadoop.mapred.TaskTracker: TaskTracker's >>>> totalMemoryAllottedForTasks is -1. TaskMemoryManager is >>>> disabled. >>>> 2012-08-13 00:59:38,145 INFO >>>> org.apache.hadoop.mapred.IndexCache: IndexCache created >>>> with max memory = 10485760 >>>> 2012-08-13 00:59:38,158 INFO >>>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean >>>> for source ShuffleServerMetrics registered. >>>> 2012-08-13 00:59:38,161 INFO >>>> org.apache.hadoop.http.HttpServer: Port returned by >>>> webServer.getConnectors()[0].getLocalPort() before open() >>>> is -1. Opening the listener on 50060 >>>> 2012-08-13 00:59:38,161 ERROR >>>> org.apache.hadoop.mapred.TaskTracker: Can not start task >>>> tracker because java.net.BindException: Address already in use >>>> at sun.nio.ch.Net.bind(Native Method) >>>> at sun.nio.ch >>>> .ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119) >>>> at sun.nio.ch >>>> .ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) >>>> at >>>> org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216) >>>> at >>>> org.apache.hadoop.http.HttpServer.start(HttpServer.java:581) >>>> at >>>> org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:1502) >>>> at >>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3742) >>>> >>>> 2012-08-13 00:59:38,163 INFO >>>> org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG: >>>> /************************************************************ >>>> SHUTDOWN_MSG: Shutting down TaskTracker at >>>> its-cs133.its.uni-kassel.de/141.51.205.43 >>>> >>>> ************************************************************/ >>>> >>>> >>> >> > > --------------080801060007000305090905 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit
Hi,

with "using DNS" you mean using the servers' non-IP-names, right?
If so, i do use DNS. Since i am working in a SLURM enviroment and i get a list of nodes for evry job i schedule, i construct the config files for evry job by taking the list of assigned nodes and deviding the roles(NameNode,JobTracker,SecondaryNameNode,TaskTrackers,DataNodes) over this set of machines. SLURM offers me names like "its-cs<nodenumber>" which is enough for ssh to connect - maybe it isnt for all hadoop processes. The complete names would be "its-cs<nodenumber>.its.uni-kassel.de". I will add this part of the adress for testing. But i fear it wont help alot, cause the JobTracker's log seems to know the full names:
###
2012-08-13 01:12:02,770 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201208130059_0001_m_000887 has split on node:/default-rack/its-cs202.its.uni-kassel.de
2012-08-13 01:12:02,770 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201208130059_0001_m_000888 has split on node:/default-rack/its-cs202.its.uni-kassel.de
2012-08-13 01:12:02,770 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201208130059_0001_m_000889 has split on node:/default-rack/its-cs195.its.uni-kassel.de
2012-08-13 01:12:02,770 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201208130059_0001_m_000890 has split on node:/default-rack/its-cs196.its.uni-kassel.de
2012-08-13 01:12:02,770 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201208130059_0001_m_000891 has split on node:/default-rack/its-cs201.its.uni-kassel.de
###

Pings work btw: i could ping the NameNode from all problematic nodes. And lsof -i didnt yield and other programs running on the NameNode/JobTracker node with the problematic ports. :( Maybe something to notice is, that after the NameNode/JobTracker server is atm not running anymore although the DataNode/TaskTracker logs are still growing.


Concerning IPv6: as far as i can see i would have to modify global config files to dsiable it. Since i am only a user of this cluster with very limited insight in why the machines are configured the way they are, i want to be very careful with asking the technicians to make changes to their setup. I dont want to be respectless.
I will try using the full names first and if this doesnt help, i will ofc ask them if no other options are available.


Am 13.08.12 16:12, schrieb Mohammad Tariq:
Hi Michael,
       I asked for hosts file because there seems to be some loopback prob to me. The log shows that call is going at 0.0.0.0. Apart from what you have said, I think disabling IPv6 and making sure that there is no prob with the DNS resolution is also necessary. Please correct me if I am wrong. Thank you.

Regards,
    Mohammad Tariq



On Mon, Aug 13, 2012 at 7:06 PM, Michael Segel <michael_segel@hotmail.com> wrote:
Based on your /etc/hosts output, why aren't you using DNS? 

Outside of MapR, multihomed machines can be problematic. Hadoop doesn't generally work well when you're not using the FQDN or its alias. 

The issue isn't the SSH, but if you go to the node which is having trouble connecting to another node,  then try to ping it, or some other general communication,  if it succeeds, your issue is that the port you're trying to communicate with is blocked.  Then its more than likely an ipconfig or firewall issue.

On Aug 13, 2012, at 8:17 AM, Björn-Elmar Macek <ema@cs.uni-kassel.de> wrote:

Hi Michael,

well i can ssh from any node to any other without being prompted. The reason for this is, that my home dir is mounted in every server in the cluster.

If the machines are multihomed: i dont know. i could ask if this would be of importance.

Shall i?

Regards,
Elmar

Am 13.08.12 14:59, schrieb Michael Segel:
If the nodes can communicate and distribute data, then the odds are that the issue isn't going to be in his /etc/hosts. 

A more relevant question is if he's running a firewall on each of these machines? 

A simple test... ssh to one node, ping other nodes and the control nodes at random to see if they can see one another. Then check to see if there is a firewall running which would limit the types of traffic between nodes. 

One other side note... are these machines multi-homed?

On Aug 13, 2012, at 7:51 AM, Mohammad Tariq <dontariq@gmail.com> wrote:

Hello there,

     Could you please share your /etc/hosts file, if you don't mind.

Regards,
    Mohammad Tariq



On Mon, Aug 13, 2012 at 6:01 PM, Björn-Elmar Macek <macek@cs.uni-kassel.de> wrote:
Hi,

i am currently trying to run my hadoop program on a cluster. Sadly though my datanodes and tasktrackers seem to have difficulties with their communication as their logs say:
* Some datanodes and tasktrackers seem to have portproblems of some kind as it can be seen in the logs below. I wondered if this might be due to reasons correllated with the localhost entry in /etc/hosts as you can read in alot of posts with similar errors, but i checked the file neither localhost nor 127.0.0.1/127.0.1.1 is bound there. (although you can ping localhost... the technician of the cluster said he'd be looking for the mechanics resolving localhost)
* The other nodes can not speak with the namenode and jobtracker (its-cs131). Although it is absolutely not clear, why this is happening: the "dfs -put" i do directly before the job is running fine, which seems to imply that communication between those servers is working flawlessly.

Is there any reason why this might happen?


Regards,
Elmar

LOGS BELOW:

\____Datanodes

After successfully putting the data to hdfs (at this point i thought namenode and datanodes have to communicate), i get the following errors when starting the job:

There are 2 kinds of logs i found: the first one is big (about 12MB) and looks like this:
############################### LOG TYPE 1 ############################################################
2012-08-13 08:23:27,331 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 0 time(s).
2012-08-13 08:23:28,332 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 1 time(s).
2012-08-13 08:23:29,332 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 2 time(s).
2012-08-13 08:23:30,332 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 3 time(s).
2012-08-13 08:23:31,333 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 4 time(s).
2012-08-13 08:23:32,333 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 5 time(s).
2012-08-13 08:23:33,334 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 6 time(s).
2012-08-13 08:23:34,334 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 7 time(s).
2012-08-13 08:23:35,334 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 8 time(s).
2012-08-13 08:23:36,335 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 9 time(s).
2012-08-13 08:23:36,335 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: java.net.ConnectException: Call to its-cs131/141.51.205.41:35554 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
    at org.apache.hadoop.ipc.Client.call(Client.java:1071)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
    at $Proxy5.sendHeartbeat(Unknown Source)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:904)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1458)
    at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
    at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
    at org.apache.hadoop.ipc.Client.call(Client.java:1046)
    ... 5 more

... (this continues til the end of the log)

The second is short kind:
########################### LOG TYPE 2 ############################################################
2012-08-13 00:59:19,038 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = its-cs133.its.uni-kassel.de/141.51.205.43
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 1.0.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2 -r 1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012
************************************************************/
2012-08-13 00:59:19,203 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2012-08-13 00:59:19,216 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2012-08-13 00:59:19,217 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2012-08-13 00:59:19,218 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2012-08-13 00:59:19,306 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2012-08-13 00:59:19,346 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2012-08-13 00:59:20,482 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 0 time(s).
2012-08-13 00:59:21,584 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /home/work/bmacek/hadoop/hdfs/slave is not formatted.
2012-08-13 00:59:21,584 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
2012-08-13 00:59:21,787 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Registered FSDatasetStatusMBean
2012-08-13 00:59:21,897 INFO org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: Shutting down all async disk service threads...
2012-08-13 00:59:21,897 INFO org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: All async disk service threads have been shut down.
2012-08-13 00:59:21,898 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.net.BindException: Problem binding to /0.0.0.0:50010 : Address already in use
    at org.apache.hadoop.ipc.Server.bind(Server.java:227)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:404)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682)
Caused by: java.net.BindException: Address already in use
    at sun.nio.ch.Net.bind(Native Method)
    at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
    at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
    at org.apache.hadoop.ipc.Server.bind(Server.java:225)
    ... 7 more

2012-08-13 00:59:21,899 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at its-cs133.its.uni-kassel.de/141.51.205.43
************************************************************/





\_____TastTracker
With TaskTrackers it is the same: there are 2 kinds.
############################### LOG TYPE 1 ############################################################
2012-08-13 02:09:54,645 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'its-cs131' with reponseId '879
2012-08-13 02:09:55,646 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35555. Already tried 0 time(s).
2012-08-13 02:09:56,646 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35555. Already tried 1 time(s).
2012-08-13 02:09:57,647 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35555. Already tried 2 time(s).
2012-08-13 02:09:58,647 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35555. Already tried 3 time(s).
2012-08-13 02:09:59,648 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35555. Already tried 4 time(s).
2012-08-13 02:10:00,648 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35555. Already tried 5 time(s).
2012-08-13 02:10:01,649 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35555. Already tried 6 time(s).
2012-08-13 02:10:02,649 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35555. Already tried 7 time(s).
2012-08-13 02:10:03,650 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35555. Already tried 8 time(s).
2012-08-13 02:10:04,650 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35555. Already tried 9 time(s).
2012-08-13 02:10:04,651 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: java.net.ConnectException: Call to its-cs131/141.51.205.41:35555 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
    at org.apache.hadoop.ipc.Client.call(Client.java:1071)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
    at org.apache.hadoop.mapred.$Proxy5.heartbeat(Unknown Source)
    at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1857)
    at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1653)
    at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2503)
    at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3744)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
    at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
    at org.apache.hadoop.ipc.Client.call(Client.java:1046)
    ... 6 more


########################### LOG TYPE 2 ############################################################
2012-08-13 00:59:24,376 INFO org.apache.hadoop.mapred.TaskTracker: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting TaskTracker
STARTUP_MSG:   host = its-cs133.its.uni-kassel.de/141.51.205.43
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 1.0.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2 -r 1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012
************************************************************/
2012-08-13 00:59:24,569 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2012-08-13 00:59:24,626 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2012-08-13 00:59:24,627 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2012-08-13 00:59:24,627 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: TaskTracker metrics system started
2012-08-13 00:59:24,950 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2012-08-13 00:59:25,146 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2012-08-13 00:59:25,206 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2012-08-13 00:59:25,232 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2012-08-13 00:59:25,237 INFO org.apache.hadoop.mapred.TaskTracker: Starting tasktracker with owner as bmacek
2012-08-13 00:59:25,239 INFO org.apache.hadoop.mapred.TaskTracker: Good mapred local directories are: /home/work/bmacek/hadoop/hdfs/tmp/mapred/local
2012-08-13 00:59:25,244 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2012-08-13 00:59:25,255 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered.
2012-08-13 00:59:25,256 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source TaskTrackerMetrics registered.
2012-08-13 00:59:25,279 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
2012-08-13 00:59:25,282 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcDetailedActivityForPort54850 registered.
2012-08-13 00:59:25,282 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcActivityForPort54850 registered.
2012-08-13 00:59:25,287 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 54850: starting
2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 54850: starting
2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 54850: starting
2012-08-13 00:59:25,289 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker up at: localhost/127.0.0.1:54850
2012-08-13 00:59:25,289 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 54850: starting
2012-08-13 00:59:25,289 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 54850: starting
2012-08-13 00:59:25,289 INFO org.apache.hadoop.mapred.TaskTracker: Starting tracker tracker_its-cs133.its.uni-kassel.de:localhost/127.0.0.1:54850
2012-08-13 00:59:26,321 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35555. Already tried 0 time(s).
2012-08-13 00:59:38,104 INFO org.apache.hadoop.mapred.TaskTracker: Starting thread: Map-events fetcher for all reduce tasks on tracker_its-cs133.its.uni-kassel.de:localhost/127.0.0.1:54850
2012-08-13 00:59:38,120 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
2012-08-13 00:59:38,134 INFO org.apache.hadoop.mapred.TaskTracker: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@445e228
2012-08-13 00:59:38,137 WARN org.apache.hadoop.mapred.TaskTracker: TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is disabled.
2012-08-13 00:59:38,145 INFO org.apache.hadoop.mapred.IndexCache: IndexCache created with max memory = 10485760
2012-08-13 00:59:38,158 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ShuffleServerMetrics registered.
2012-08-13 00:59:38,161 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50060
2012-08-13 00:59:38,161 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.net.BindException: Address already in use
    at sun.nio.ch.Net.bind(Native Method)
    at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
    at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
    at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
    at org.apache.hadoop.http.HttpServer.start(HttpServer.java:581)
    at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1502)
    at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3742)

2012-08-13 00:59:38,163 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down TaskTracker at its-cs133.its.uni-kassel.de/141.51.205.43
************************************************************/






--------------080801060007000305090905--