hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Gäde <s116...@hft-leipzig.de>
Subject Re: all tasks failing for MR job on Hadoop 2.4
Date Sun, 18 May 2014 15:27:48 GMT
Thanks for your feedback. No 'localhost' in the conf files...

However, as I'm not relying on DNS but on the hosts files on the nodes, 
I found that there was a missing entry on one node for its hostname 
pointing to its own IP address. Since I fixed that, MR jobs are working 
fine. :-)

Cheers
Seb.

Am 16.05.2014 04:07, schrieb Stanley Shi:
> please check you configuration files, are there anywhere mentioning
> "localhost"? "localhost" should not be used if you are deploying an
> distributed cluster.
>
> Regards,
> *Stanley Shi,*
>
>
>
> On Tue, May 13, 2014 at 6:52 PM, Gäde, Sebastian <s116102@hft-leipzig.de
> <mailto:s116102@hft-leipzig.de>> wrote:
>
>     Hi,
>
>     I've set up a Hadoop 2.4 cluster with three nodes. Namenode and
>     Resourcemanager are running on one node, Datanodes and Nodemanagers
>     on the other two. All services are starting up without problems (as
>     far as I can see), web apps show all nodes as running.
>
>     However, I am not able to run MapReduce jobs:
>     yarn jar hadoop-mapreduce-examples-2.4.0.jar pi 5 1000000
>     submits the job, it appears in the web app, but state is stuck in
>     ACCEPTED. Instead I'm receiving messages:
>
>     14/05/13 12:15:48 INFO mapreduce.Job: Task Id :
>     attempt_1399971492349_0004_m_000000_0, Status : FAILED
>     14/05/13 12:15:48 INFO mapreduce.Job: Task Id :
>     attempt_1399971492349_0004_m_000001_0, Status : FAILED
>
>
>     the log shows:
>
>     2014-05-13 12:13:56,702 WARN [main]
>     org.apache.hadoop.conf.Configuration: job.xml:an attempt to override
>     final parameter: mapreduce.cluster.temp.dir;  Ignoring.
>     2014-05-13 12:15:27,896 INFO [main]
>     org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties
>     from hadoop-metrics2.properties
>     2014-05-13 12:15:28,146 INFO [main]
>     org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled
>     snapshot period at 10 second(s).
>     2014-05-13 12:15:28,146 INFO [main]
>     org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics
>     system started
>     2014-05-13 12:15:28,185 INFO [main]
>     org.apache.hadoop.mapred.YarnChild: Executing with tokens:
>     2014-05-13 12:15:28,192 INFO [main]
>     org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service:
>     job_1399971492349_0004, Ident:
>     (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@364879)
>     2014-05-13 12:15:28,453 INFO [main]
>     org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying
>     again. Got null now.
>     2014-05-13 12:15:28,662 WARN [main] org.apache.hadoop.ipc.Client:
>     Address change detected. Old: localhost/127.0.1.1:41395
>     <http://127.0.1.1:41395> New: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>
>     2014-05-13 12:15:29,664 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 0 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:30,665 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 1 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:31,666 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 2 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:32,667 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 3 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:33,668 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 4 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:34,669 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 5 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:35,669 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 6 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:36,670 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 7 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:37,671 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 8 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:38,672 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 9 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:38,675 WARN [main]
>     org.apache.hadoop.mapred.YarnChild: Exception running child :
>     java.net.ConnectException: Call From
>     hd-slave-172.ffm.telekom.de/164.26.155.172
>     <http://hd-slave-172.ffm.telekom.de/164.26.155.172> to
>     localhost:41395 failed on connection exception:
>     java.net.ConnectException: Verbindungsaufbau abgelehnt; For more
>     details see: http://wiki.apache.org/hadoop/ConnectionRefused
>              at
>     sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>              at
>     sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>              at
>     sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>              at
>     java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>              at
>     org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
>              at
>     org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
>              at org.apache.hadoop.ipc.Client.call(Client.java:1414)
>              at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>              at
>     org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
>              at com.sun.proxy.$Proxy9.getTask(Unknown Source)
>              at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:136)
>     Caused by: java.net.ConnectException: Verbindungsaufbau abgelehnt
>              at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>              at
>     sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>              at
>     org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>              at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
>              at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
>              at
>     org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
>              at
>     org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
>              at
>     org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
>              at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
>              at org.apache.hadoop.ipc.Client.call(Client.java:1381)
>              ... 4 more
>
>     Not sure about
>     a) the 90 seconds break between 12:13 - 12:15. I think I'm running
>     into some kind of timeout, but I don't know how to find out what the
>     system is doing during that time.
>     b) the localhost:41395. I cannot find a deamon listening using
>     netstat. I suppose this is some kind of local IPC deamon which is
>     also affected by a timeout?
>
>     Any ideas?
>
>     Cheers
>     Seb.
>
>

Mime
View raw message