hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From G├Ąde, Sebastian <s116...@hft-leipzig.de>
Subject all tasks failing for MR job on Hadoop 2.4
Date Tue, 13 May 2014 10:52:57 GMT
Hi,

I've set up a Hadoop 2.4 cluster with three nodes. Namenode and Resourcemanager are running
on one node, Datanodes and Nodemanagers on the other two. All services are starting up without
problems (as far as I can see), web apps show all nodes as running.

However, I am not able to run MapReduce jobs:
yarn jar hadoop-mapreduce-examples-2.4.0.jar pi 5 1000000
submits the job, it appears in the web app, but state is stuck in ACCEPTED. Instead I'm receiving
messages:

14/05/13 12:15:48 INFO mapreduce.Job: Task Id : attempt_1399971492349_0004_m_000000_0, Status
: FAILED
14/05/13 12:15:48 INFO mapreduce.Job: Task Id : attempt_1399971492349_0004_m_000001_0, Status
: FAILED


the log shows:

2014-05-13 12:13:56,702 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt
to override final parameter: mapreduce.cluster.temp.dir;  Ignoring.
2014-05-13 12:15:27,896 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded
properties from hadoop-metrics2.properties
2014-05-13 12:15:28,146 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled
snapshot period at 10 second(s).
2014-05-13 12:15:28,146 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask
metrics system started
2014-05-13 12:15:28,185 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2014-05-13 12:15:28,192 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job,
Service: job_1399971492349_0004, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@364879)
2014-05-13 12:15:28,453 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before
retrying again. Got null now.
2014-05-13 12:15:28,662 WARN [main] org.apache.hadoop.ipc.Client: Address change detected.
Old: localhost/127.0.1.1:41395 New: localhost/127.0.0.1:41395
2014-05-13 12:15:29,664 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server:
localhost/127.0.0.1:41395. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-05-13 12:15:30,665 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server:
localhost/127.0.0.1:41395. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-05-13 12:15:31,666 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server:
localhost/127.0.0.1:41395. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-05-13 12:15:32,667 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server:
localhost/127.0.0.1:41395. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-05-13 12:15:33,668 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server:
localhost/127.0.0.1:41395. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-05-13 12:15:34,669 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server:
localhost/127.0.0.1:41395. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-05-13 12:15:35,669 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server:
localhost/127.0.0.1:41395. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-05-13 12:15:36,670 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server:
localhost/127.0.0.1:41395. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-05-13 12:15:37,671 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server:
localhost/127.0.0.1:41395. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-05-13 12:15:38,672 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server:
localhost/127.0.0.1:41395. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-05-13 12:15:38,675 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running
child : java.net.ConnectException: Call From hd-slave-172.ffm.telekom.de/164.26.155.172 to
localhost:41395 failed on connection exception: java.net.ConnectException: Verbindungsaufbau
abgelehnt; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
	at org.apache.hadoop.ipc.Client.call(Client.java:1414)
	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
	at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
	at com.sun.proxy.$Proxy9.getTask(Unknown Source)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:136)
Caused by: java.net.ConnectException: Verbindungsaufbau abgelehnt
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
	at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
	at org.apache.hadoop.ipc.Client.call(Client.java:1381)
	... 4 more

Not sure about
a) the 90 seconds break between 12:13 - 12:15. I think I'm running into some kind of timeout,
but I don't know how to find out what the system is doing during that time.
b) the localhost:41395. I cannot find a deamon listening using netstat. I suppose this is
some kind of local IPC deamon which is also affected by a timeout?

Any ideas?

Cheers
Seb.
Mime
View raw message