hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4251) TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is failing
Date Mon, 26 Oct 2015 10:50:27 GMT

    [ https://issues.apache.org/jira/browse/YARN-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974013#comment-14974013
] 

Steve Loughran commented on YARN-4251:
--------------------------------------

bq. Also the dismissive nature of the wiki: 
"Finally, this is not a Hadoop problem, it is a host, network or Hadoop configuration problem.
As it is your cluster, only you can find out and track down the problem.. Sorry"
bq. Everything worked fine one day. I upgrade hadoop it stops working. The wiki ends with
a bold claim that every bind exception that starts the day after upgrade is not a hadoop problem.

Edward, I an assure you that most of the JIRAs we get related to: ConnectionRefused, BindException,
NoRouteToHostException,...etc are related to system configs. it is almost invariably some
machine config issue, be it ubuntu mapping localhost to 127.0.1.1; a firewall in the way,
rDNS broken, or tothers.  And we get so many complaining that the namenode is refusing connections,
when either the firewall is up, the port settings for the client are wrong, the hostname is
wrong or the NN isn't up. Same for BindException. 

We've gone to the effort of adding wrappers around all socket exceptions to add in hostnames
and ports (the things people who understand networking need), and wiki entries to help people
fend for themselves and not file Critical issues about problems that they generally have to
fix for themselves. Yet even with those exceptions saying "look at the wiki" entry, we still
get people not following the link, but going straight to JIRA: HADOOP-12391. 

if you look at the history of those wiki entries, you can see that they continually grow as
we find new system setup issues which trigger the exception.  That's because I do hit problems,
I do fix them myself, and whenever I do that, I add another line. If you've found a new way,
once fixed, I encourage you add a new entry. And, at the same time, you are free to change
that text at the end. 

> TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is failing
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-4251
>                 URL: https://issues.apache.org/jira/browse/YARN-4251
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Brahma Reddy Battula
>            Assignee: Brahma Reddy Battula
>         Attachments: YARN-4251.patch
>
>
>  *Trace* 
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem
binding to [0.0.0.0:9030] java.net.BindException: Address already in use: bind; For more details
see:  http://wiki.apache.org/hadoop/BindException
> 	at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
> 	at org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
> 	at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:143)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:592)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:975)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1016)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Unknown Source)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1012)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1052)
> 	at org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyResourceManager.serviceStart(TestAMRMClientOnRMRestart.java:560)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientOnAMRMTokenRollOverOnRMRestart(TestAMRMClientOnRMRestart.java:463)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> 	at java.lang.reflect.Method.invoke(Unknown Source)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: java.net.BindException: Problem binding to [0.0.0.0:9030] java.net.BindException:
Address already in use: bind; For more details see:  http://wiki.apache.org/hadoop/BindException
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
> 	at java.lang.reflect.Constructor.newInstance(Unknown Source)
> 	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
> 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
> 	at org.apache.hadoop.ipc.Server.bind(Server.java:486)
> 	at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:646)
> 	at org.apache.hadoop.ipc.Server.<init>(Server.java:2399)
> 	at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:946)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:537)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:510)
> 	at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:787)
> 	at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.createServer(RpcServerFactoryPBImpl.java:169)
> 	at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:132)
> 	... 27 more
> Caused by: java.net.BindException: Address already in use: bind
> 	at sun.nio.ch.Net.bind0(Native Method)
> 	at sun.nio.ch.Net.bind(Unknown Source)
> 	at sun.nio.ch.Net.bind(Unknown Source)
> 	at sun.nio.ch.ServerSocketChannelImpl.bind(Unknown Source)
> 	at sun.nio.ch.ServerSocketAdaptor.bind(Unknown Source)
> 	at org.apache.hadoop.ipc.Server.bind(Server.java:469)
> 	... 35 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message