hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hitesh Shah (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-196) Nodemanager should be more robust in handling connection failure to ResourceManager when a cluster is started
Date Fri, 15 Mar 2013 18:10:16 GMT

    [ https://issues.apache.org/jira/browse/YARN-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603621#comment-13603621
] 

Hitesh Shah commented on YARN-196:
----------------------------------

Committed to branch-2 and trunk. Thanks Xuan for addressing the numerous review comments and
being so patient. I have also filed a related jira regarding similar handling of connection
loss after the NM is up. 
                
> Nodemanager should be more robust in handling connection failure  to ResourceManager
when a cluster is started
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-196
>                 URL: https://issues.apache.org/jira/browse/YARN-196
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.0.0, 2.0.0-alpha
>            Reporter: Ramgopal N
>            Assignee: Xuan Gong
>         Attachments: MAPREDUCE-3676.patch, YARN-196.10.patch, YARN-196.11.patch, YARN-196.12.1.patch,
YARN-196.12.patch, YARN-196.1.patch, YARN-196.2.patch, YARN-196.3.patch, YARN-196.4.patch,
YARN-196.5.patch, YARN-196.6.patch, YARN-196.7.patch, YARN-196.8.patch, YARN-196.9.patch
>
>
> If NM is started before starting the RM ,NM is shutting down with the following error
> {code}
> ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting services org.apache.hadoop.yarn.server.nodemanager.NodeManager
> org.apache.avro.AvroRuntimeException: java.lang.reflect.UndeclaredThrowableException
> 	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149)
> 	at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
> 	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167)
> 	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242)
> Caused by: java.lang.reflect.UndeclaredThrowableException
> 	at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
> 	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182)
> 	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145)
> 	... 3 more
> Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: Call From
HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException:
Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
> 	at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131)
> 	at $Proxy23.registerNodeManager(Unknown Source)
> 	at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
> 	... 5 more
> Caused by: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025
failed on connection exception: java.net.ConnectException: Connection refused; For more details
see:  http://wiki.apache.org/hadoop/ConnectionRefused
> 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1141)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1100)
> 	at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128)
> 	... 7 more
> Caused by: java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:659)
> 	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:469)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:563)
> 	at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:211)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1247)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1117)
> 	... 9 more
> 2012-01-16 15:04:13,336 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher
thread interrupted
> java.lang.InterruptedException
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)
> 	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76)
> 	at java.lang.Thread.run(Thread.java:619)
> 2012-01-16 15:04:13,337 INFO org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher
is stopped.
> 2012-01-16 15:04:13,392 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:9999
> 2012-01-16 15:04:13,493 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer
is stopped.
> 2012-01-16 15:04:13,493 INFO org.apache.hadoop.ipc.Server: Stopping server on 24290
> 2012-01-16 15:04:13,494 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener
on 24290
> 2012-01-16 15:04:13,495 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
> 2012-01-16 15:04:13,496 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler
is stopped.
> 2012-01-16 15:04:13,496 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher
thread interrupted
> java.lang.InterruptedException
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)
> 	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message