hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3634) TestMRTimelineEventHandling and TestApplication are broken
Date Wed, 13 May 2015 16:48:01 GMT

    [ https://issues.apache.org/jira/browse/YARN-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14542217#comment-14542217
] 

Junping Du commented on YARN-3634:
----------------------------------

bq.  The initialization of the NMCollectorService RPC client is now delayed until the first
use (that's why direct references to nmCollectorService are replaced by the getNMCollectorService()
calls). And that's' the reason synchronization is needed to prevent multiple threads competing
to initialize the RPC client.
Sorry. I was missing that we delay the calling getNMCollectorService() to later when service
is more ready. Agree that synchronized block is necessary here. In addition, we may want to
double check "if (nmCollectorService == null)" within synchronized block given the little
chance that the first thread is just check nmCollectorService while the other thread is setting
nmCollectorService (but haven't done yet).

Other looks good to me.

> TestMRTimelineEventHandling and TestApplication are broken
> ----------------------------------------------------------
>
>                 Key: YARN-3634
>                 URL: https://issues.apache.org/jira/browse/YARN-3634
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: YARN-3634-YARN-2928.001.patch, YARN-3634-YARN-2928.002.patch, YARN-3634-YARN-2928.003.patch
>
>
> TestMRTimelineEventHandling is broken. Relevant error message:
> {noformat}
> 2015-05-12 06:28:56,415 INFO  [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882))
- Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 0 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
> 2015-05-12 06:28:57,416 INFO  [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882))
- Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 1 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
> 2015-05-12 06:28:58,416 INFO  [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882))
- Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 2 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
> 2015-05-12 06:28:59,417 INFO  [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882))
- Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 3 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
> 2015-05-12 06:29:00,418 INFO  [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882))
- Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 4 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
> 2015-05-12 06:29:01,419 INFO  [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882))
- Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 5 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
> 2015-05-12 06:29:02,420 INFO  [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882))
- Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 6 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
> 2015-05-12 06:29:03,420 INFO  [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882))
- Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 7 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
> 2015-05-12 06:29:04,421 INFO  [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882))
- Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 8 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
> 2015-05-12 06:29:05,422 INFO  [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882))
- Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 9 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
> 2015-05-12 06:29:05,424 ERROR [AsyncDispatcher event handler] collector.NodeTimelineCollectorManager
(NodeTimelineCollectorManager.java:postPut(121)) - Failed to communicate with NM Collector
Service for application_1431412130291_0001
> 2015-05-12 06:29:05,425 WARN  [AsyncDispatcher event handler] containermanager.AuxServices
(AuxServices.java:logWarningWhenAuxServiceThrowExceptions(261)) - The auxService name is timeline_collector
and it got an error at event: CONTAINER_INIT
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
java.net.ConnectException: Call From asf904.gq1.ygridcore.net/67.195.81.148 to asf904.gq1.ygridcore.net:0
failed on connection exception: java.net.ConnectException: Connection refused; For more details
see:  http://wiki.apache.org/hadoop/ConnectionRefused
> 	at org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.putIfAbsent(TimelineCollectorManager.java:97)
> 	at org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.addApplication(PerNodeTimelineCollectorsAuxService.java:99)
> 	at org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.initializeContainer(PerNodeTimelineCollectorsAuxService.java:126)
> 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.handle(AuxServices.java:226)
> 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.handle(AuxServices.java:49)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException:
Call From asf904.gq1.ygridcore.net/67.195.81.148 to asf904.gq1.ygridcore.net:0 failed on connection
exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
> 	at org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.postPut(NodeTimelineCollectorManager.java:122)
> 	at org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.putIfAbsent(TimelineCollectorManager.java:95)
> 	... 7 more
> Caused by: java.net.ConnectException: Call From asf904.gq1.ygridcore.net/67.195.81.148
to asf904.gq1.ygridcore.net:0 failed on connection exception: java.net.ConnectException: Connection
refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> 	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
> 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1496)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1423)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 	at com.sun.proxy.$Proxy108.getTimelineCollectorContext(Unknown Source)
> 	at org.apache.hadoop.yarn.server.api.impl.pb.client.CollectorNodemanagerProtocolPBClientImpl.getTimelineCollectorContext(CollectorNodemanagerProtocolPBClientImpl.java:99)
> 	at org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.updateTimelineCollectorContext(NodeTimelineCollectorManager.java:188)
> 	at org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.postPut(NodeTimelineCollectorManager.java:116)
> 	... 8 more
> Caused by: java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
> 	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:625)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:723)
> 	at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1545)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1462)
> 	... 14 more
> {noformat}
> This surfaced when we switched to use port ":0" for the mini-YARN cluster for the node
collector service.
> Also, TestApplication tests are broken because the mocked context does not have the configuration
object which ApplicationImpl depends on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message