Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 896E711A09 for ; Sun, 18 May 2014 16:23:58 +0000 (UTC) Received: (qmail 35357 invoked by uid 500); 18 May 2014 15:50:52 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 33806 invoked by uid 500); 18 May 2014 15:50:51 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 32036 invoked by uid 99); 18 May 2014 15:28:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 May 2014 15:28:17 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=NORMAL_HTTP_TO_IP,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [212.184.75.220] (HELO mx.hft-leipzig.de) (212.184.75.220) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 May 2014 15:28:13 +0000 Received: from securemail.hft-leipzig.de (securemail.hft-leipzig.de [212.184.75.200]) by mx.hft-leipzig.de with smtp (TLS: TLSv1/SSLv3,256bits,AES256-SHA) id 0132_022c_5f333c72_c360_4e96_8b5d_07d71d6077ec; Sun, 18 May 2014 17:27:49 +0200 Received: from [192.168.2.123] (p508A6E78.dip0.t-ipconnect.de [80.138.110.120]) by securemail.hft-leipzig.de (Postfix) with ESMTPSA id A0DE61CE2DD for ; Sun, 18 May 2014 17:27:49 +0200 (CEST) Message-ID: <5378D174.1040806@hft-leipzig.de> Date: Sun, 18 May 2014 17:27:48 +0200 From: =?UTF-8?B?U2ViYXN0aWFuIEfDpGRl?= User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: user@hadoop.apache.org Subject: Re: all tasks failing for MR job on Hadoop 2.4 References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Thanks for your feedback. No 'localhost' in the conf files... However, as I'm not relying on DNS but on the hosts files on the nodes,=20 I found that there was a missing entry on one node for its hostname=20 pointing to its own IP address. Since I fixed that, MR jobs are working=20 fine. :-) Cheers Seb. Am 16.05.2014 04:07, schrieb Stanley Shi: > please check you configuration files, are there anywhere mentioning > "localhost"? "localhost" should not be used if you are deploying an > distributed cluster. > > Regards, > *Stanley Shi,* > > > > On Tue, May 13, 2014 at 6:52 PM, G=C3=A4de, Sebastian > wrote: > > Hi, > > I've set up a Hadoop 2.4 cluster with three nodes. Namenode and > Resourcemanager are running on one node, Datanodes and Nodemanagers > on the other two. All services are starting up without problems (as > far as I can see), web apps show all nodes as running. > > However, I am not able to run MapReduce jobs: > yarn jar hadoop-mapreduce-examples-2.4.0.jar pi 5 1000000 > submits the job, it appears in the web app, but state is stuck in > ACCEPTED. Instead I'm receiving messages: > > 14/05/13 12:15:48 INFO mapreduce.Job: Task Id : > attempt_1399971492349_0004_m_000000_0, Status : FAILED > 14/05/13 12:15:48 INFO mapreduce.Job: Task Id : > attempt_1399971492349_0004_m_000001_0, Status : FAILED > > > the log shows: > > 2014-05-13 12:13:56,702 WARN [main] > org.apache.hadoop.conf.Configuration: job.xml:an attempt to overrid= e > final parameter: mapreduce.cluster.temp.dir; Ignoring. > 2014-05-13 12:15:27,896 INFO [main] > org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties > from hadoop-metrics2.properties > 2014-05-13 12:15:28,146 INFO [main] > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled > snapshot period at 10 second(s). > 2014-05-13 12:15:28,146 INFO [main] > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics > system started > 2014-05-13 12:15:28,185 INFO [main] > org.apache.hadoop.mapred.YarnChild: Executing with tokens: > 2014-05-13 12:15:28,192 INFO [main] > org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: > job_1399971492349_0004, Ident: > (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@3648= 79) > 2014-05-13 12:15:28,453 INFO [main] > org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retryin= g > again. Got null now. > 2014-05-13 12:15:28,662 WARN [main] org.apache.hadoop.ipc.Client: > Address change detected. Old: localhost/127.0.1.1:41395 > New: localhost/127.0.0.1:41395 > > 2014-05-13 12:15:29,664 INFO [main] org.apache.hadoop.ipc.Client: > Retrying connect to server: localhost/127.0.0.1:41395 > . Already tried 0 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=3D10, sleepTime=3D10= 00 > MILLISECONDS) > 2014-05-13 12:15:30,665 INFO [main] org.apache.hadoop.ipc.Client: > Retrying connect to server: localhost/127.0.0.1:41395 > . Already tried 1 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=3D10, sleepTime=3D10= 00 > MILLISECONDS) > 2014-05-13 12:15:31,666 INFO [main] org.apache.hadoop.ipc.Client: > Retrying connect to server: localhost/127.0.0.1:41395 > . Already tried 2 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=3D10, sleepTime=3D10= 00 > MILLISECONDS) > 2014-05-13 12:15:32,667 INFO [main] org.apache.hadoop.ipc.Client: > Retrying connect to server: localhost/127.0.0.1:41395 > . Already tried 3 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=3D10, sleepTime=3D10= 00 > MILLISECONDS) > 2014-05-13 12:15:33,668 INFO [main] org.apache.hadoop.ipc.Client: > Retrying connect to server: localhost/127.0.0.1:41395 > . Already tried 4 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=3D10, sleepTime=3D10= 00 > MILLISECONDS) > 2014-05-13 12:15:34,669 INFO [main] org.apache.hadoop.ipc.Client: > Retrying connect to server: localhost/127.0.0.1:41395 > . Already tried 5 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=3D10, sleepTime=3D10= 00 > MILLISECONDS) > 2014-05-13 12:15:35,669 INFO [main] org.apache.hadoop.ipc.Client: > Retrying connect to server: localhost/127.0.0.1:41395 > . Already tried 6 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=3D10, sleepTime=3D10= 00 > MILLISECONDS) > 2014-05-13 12:15:36,670 INFO [main] org.apache.hadoop.ipc.Client: > Retrying connect to server: localhost/127.0.0.1:41395 > . Already tried 7 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=3D10, sleepTime=3D10= 00 > MILLISECONDS) > 2014-05-13 12:15:37,671 INFO [main] org.apache.hadoop.ipc.Client: > Retrying connect to server: localhost/127.0.0.1:41395 > . Already tried 8 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=3D10, sleepTime=3D10= 00 > MILLISECONDS) > 2014-05-13 12:15:38,672 INFO [main] org.apache.hadoop.ipc.Client: > Retrying connect to server: localhost/127.0.0.1:41395 > . Already tried 9 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=3D10, sleepTime=3D10= 00 > MILLISECONDS) > 2014-05-13 12:15:38,675 WARN [main] > org.apache.hadoop.mapred.YarnChild: Exception running child : > java.net.ConnectException: Call From > hd-slave-172.ffm.telekom.de/164.26.155.172 > to > localhost:41395 failed on connection exception: > java.net.ConnectException: Verbindungsaufbau abgelehnt; For more > details see: http://wiki.apache.org/hadoop/ConnectionRefused > at > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Metho= d) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstru= ctorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Delegatin= gConstructorAccessorImpl.java:45) > at > java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) > at > org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) > at org.apache.hadoop.ipc.Client.call(Client.java:1414) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcE= ngine.java:231) > at com.sun.proxy.$Proxy9.getTask(Unknown Source) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:= 136) > Caused by: java.net.ConnectException: Verbindungsaufbau abgelehnt > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method= ) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:7= 39) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeo= ut.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:52= 9) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:49= 3) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java= :604) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:= 699) > at > org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367= ) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:= 1462) > at org.apache.hadoop.ipc.Client.call(Client.java:1381) > ... 4 more > > Not sure about > a) the 90 seconds break between 12:13 - 12:15. I think I'm running > into some kind of timeout, but I don't know how to find out what th= e > system is doing during that time. > b) the localhost:41395. I cannot find a deamon listening using > netstat. I suppose this is some kind of local IPC deamon which is > also affected by a timeout? > > Any ideas? > > Cheers > Seb. > >