hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2710) RM HA tests failed intermittently on trunk
Date Sun, 14 Dec 2014 21:04:14 GMT

    [ https://issues.apache.org/jira/browse/YARN-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246125#comment-14246125
] 

Steve Loughran commented on YARN-2710:
--------------------------------------

Immediate failure trigger is that the retry handler says "Not retrying because failovers (5)
exceeded maximum allowed (5)"

If you look at the lines immediately above it, the RM is still bootstrapping hence not listening
for connections.

The test is probably just starting too early. Recommend changing the failure/retry policy
to to add some backoff & maybe more retries

{code}
014-12-14 11:40:06,693 INFO  [Thread-442] resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(165))
- USER=jenkins	OPERATION=refreshSuperUserGroupsConfiguration	TARGET=AdminService	RESULT=SUCCESS
2014-12-14 11:40:06,693 INFO  [Thread-442] conf.Configuration (Configuration.java:getConfResourceAsInputStream(2240))
- found resource core-site.xml at file:/home/jenkins/jenkins-slave/workspace/Hadoop-Yarn-trunk-Java8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/target/test-classes/core-site.xml
2014-12-14 11:40:06,694 INFO  [Thread-442] security.Groups (Groups.java:refresh(248)) - clearing
userToGroupsMap cache
2014-12-14 11:40:06,694 INFO  [Thread-442] resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(165))
- USER=jenkins	OPERATION=refreshUserToGroupsMappings	TARGET=AdminService	RESULT=SUCCESS
2014-12-14 11:40:06,694 INFO  [Thread-442] resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(165))
- USER=jenkins	OPERATION=transitionToActive	TARGET=RMHAProtocolService	RESULT=SUCCESS
2014-12-14 11:40:07,539 INFO  [Thread-443] client.ConfiguredRMFailoverProxyProvider (ConfiguredRMFailoverProxyProvider.java:performFailover(100))
- Failing over to rm2
2014-12-14 11:40:07,541 WARN  [Thread-443] retry.RetryInvocationHandler (RetryInvocationHandler.java:invoke(117))
- Exception while invoking class org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getContainers
over rm2. Not retrying because failovers (5) exceeded maximum allowed (5)
java.net.ConnectException: Call From asf908.gq1.ygridcore.net/67.195.81.152 to asf908.gq1.ygridcore.net:28032
failed on connection exception: java.net.ConnectException: Connection refused; For more details
see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
	at org.apache.hadoop.ipc.Client.call(Client.java:1472)
	at org.apache.hadoop.ipc.Client.call(Client.java:1399)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
	at com.sun.proxy.$Proxy16.getContainers(Unknown Source)
	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getContainers(ApplicationClientProtocolPBClientImpl.java:410)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:483)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
	at com.sun.proxy.$Proxy17.getContainers(Unknown Source)
	at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getContainers(YarnClientImpl.java:654)
	at org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA.testGetContainersOnHA(TestApplicationClientProtocolOnHA.java:155)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:483)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:712)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
	at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
	at org.apache.hadoop.ipc.Client.call(Client.java:1438)
	... 23 more
{code}

> RM HA tests failed intermittently on trunk
> ------------------------------------------
>
>                 Key: YARN-2710
>                 URL: https://issues.apache.org/jira/browse/YARN-2710
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: client
>            Reporter: Wangda Tan
>         Attachments: TestResourceTrackerOnHA-output.2.txt, org.apache.hadoop.yarn.client.TestResourceTrackerOnHA-output.txt
>
>
> Failure like, it can be happened in TestApplicationClientProtocolOnHA, TestResourceTrackerOnHA,
etc.
> {code}
> org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
> testGetApplicationAttemptsOnHA(org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA)
 Time elapsed: 9.491 sec  <<< ERROR!
> java.net.ConnectException: Call From asf905.gq1.ygridcore.net/67.195.81.149 to asf905.gq1.ygridcore.net:28032
failed on connection exception: java.net.ConnectException: Connection refused; For more details
see:  http://wiki.apache.org/hadoop/ConnectionRefused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
> 	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
> 	at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1438)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1399)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> 	at com.sun.proxy.$Proxy17.getApplicationAttempts(Unknown Source)
> 	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationAttempts(ApplicationClientProtocolPBClientImpl.java:372)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
> 	at com.sun.proxy.$Proxy18.getApplicationAttempts(Unknown Source)
> 	at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationAttempts(YarnClientImpl.java:583)
> 	at org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA.testGetApplicationAttemptsOnHA(TestApplicationClientProtocolOnHA.java:137)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message