hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stanley Shi <s...@gopivotal.com>
Subject Re: all tasks failing for MR job on Hadoop 2.4
Date Fri, 16 May 2014 02:07:22 GMT
please check you configuration files, are there anywhere mentioning
"localhost"? "localhost" should not be used if you are deploying an
distributed cluster.

Regards,
*Stanley Shi,*



On Tue, May 13, 2014 at 6:52 PM, G├Ąde, Sebastian <s116102@hft-leipzig.de>wrote:

> Hi,
>
> I've set up a Hadoop 2.4 cluster with three nodes. Namenode and
> Resourcemanager are running on one node, Datanodes and Nodemanagers on the
> other two. All services are starting up without problems (as far as I can
> see), web apps show all nodes as running.
>
> However, I am not able to run MapReduce jobs:
> yarn jar hadoop-mapreduce-examples-2.4.0.jar pi 5 1000000
> submits the job, it appears in the web app, but state is stuck in
> ACCEPTED. Instead I'm receiving messages:
>
> 14/05/13 12:15:48 INFO mapreduce.Job: Task Id :
> attempt_1399971492349_0004_m_000000_0, Status : FAILED
> 14/05/13 12:15:48 INFO mapreduce.Job: Task Id :
> attempt_1399971492349_0004_m_000001_0, Status : FAILED
>
>
> the log shows:
>
> 2014-05-13 12:13:56,702 WARN [main] org.apache.hadoop.conf.Configuration:
> job.xml:an attempt to override final parameter: mapreduce.cluster.temp.dir;
>  Ignoring.
> 2014-05-13 12:15:27,896 INFO [main]
> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
> hadoop-metrics2.properties
> 2014-05-13 12:15:28,146 INFO [main]
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
> period at 10 second(s).
> 2014-05-13 12:15:28,146 INFO [main]
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system
> started
> 2014-05-13 12:15:28,185 INFO [main] org.apache.hadoop.mapred.YarnChild:
> Executing with tokens:
> 2014-05-13 12:15:28,192 INFO [main] org.apache.hadoop.mapred.YarnChild:
> Kind: mapreduce.job, Service: job_1399971492349_0004, Ident:
> (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@364879)
> 2014-05-13 12:15:28,453 INFO [main] org.apache.hadoop.mapred.YarnChild:
> Sleeping for 0ms before retrying again. Got null now.
> 2014-05-13 12:15:28,662 WARN [main] org.apache.hadoop.ipc.Client: Address
> change detected. Old: localhost/127.0.1.1:41395 New: localhost/
> 127.0.0.1:41395
> 2014-05-13 12:15:29,664 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 0 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:30,665 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 1 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:31,666 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 2 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:32,667 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 3 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:33,668 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 4 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:34,669 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 5 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:35,669 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 6 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:36,670 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 7 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:37,671 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 8 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:38,672 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 9 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:38,675 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.net.ConnectException: Call From
> hd-slave-172.ffm.telekom.de/164.26.155.172 to localhost:41395 failed on
> connection exception: java.net.ConnectException: Verbindungsaufbau
> abgelehnt; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>         at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>         at
> org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1414)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>         at
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
>         at com.sun.proxy.$Proxy9.getTask(Unknown Source)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:136)
> Caused by: java.net.ConnectException: Verbindungsaufbau abgelehnt
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
>         at
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
>         at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
>         at
> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1381)
>         ... 4 more
>
> Not sure about
> a) the 90 seconds break between 12:13 - 12:15. I think I'm running into
> some kind of timeout, but I don't know how to find out what the system is
> doing during that time.
> b) the localhost:41395. I cannot find a deamon listening using netstat. I
> suppose this is some kind of local IPC deamon which is also affected by a
> timeout?
>
> Any ideas?
>
> Cheers
> Seb.

Mime
View raw message