hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vrushali C (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6767) Timeline client won't be able to write when TimelineCollector is not up yet, or NM is down
Date Tue, 18 Jul 2017 06:40:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091187#comment-16091187
] 

Vrushali C commented on YARN-6767:
----------------------------------

Here is my observation in one case. I started up a job and then killed the NM of the node
that the AM was running on. The job ran successfully and I also have an history file. 

I see the following error messages in the timeline service context in the AM log.

{code}


2017-07-18 06:31:55,772 ERROR [pool-8-thread-1] org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl:
TimelineClient has reached to max retry times : 30 for service address: hostname:port
2017-07-18 06:31:55,773 ERROR [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler:
Failed to process Event JOB_FINISHED for the job : job_1500067716904_0256
org.apache.hadoop.yarn.exceptions.YarnException: Failed while publishing entity
	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher.dispatchEntities(TimelineV2ClientImpl.java:425)
	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putEntities(TimelineV2ClientImpl.java:121)
	at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForNewTimelineService(JobHistoryEventHandler.java:1289)
	at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:590)
	at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$1.run(JobHistoryEventHandler.java:339)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: TimelineClient has reached to max retry times : 30 for service
address: hostname:port
	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.checkRetryWithSleep(TimelineV2ClientImpl.java:179)
	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:151)
	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder$1.call(TimelineV2ClientImpl.java:254)
	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder$1.call(TimelineV2ClientImpl.java:248)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.publishWithoutBlockingOnQueue(TimelineV2ClientImpl.java:375)
	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.run(TimelineV2ClientImpl.java:313)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	... 1 more
Caused by: java.io.IOException: com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException:
Connection refused (Connection refused)
	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:195)
	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:147)
	... 8 more
Caused by: com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection
refused (Connection refused)
	at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
	at com.sun.jersey.api.client.Client.handle(Client.java:648)
	at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
	at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
	at com.sun.jersey.api.client.WebResource$Builder.put(WebResource.java:533)
	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:188)
	... 9 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
	at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
	at sun.net.www.http.HttpClient.New(HttpClient.java:339)
	at sun.net.www.http.HttpClient.New(HttpClient.java:357)
	at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
	at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966)
	at org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:76)
	at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:127)
	at org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:216)
	at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.openConnection(DelegationTokenAuthenticatedURL.java:322)
	at org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineURLConnectionFactory$1.run(TimelineConnector.java:261)
	at org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineURLConnectionFactory$1.run(TimelineConnector.java:258)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1645)
	at org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineURLConnectionFactory.getHttpURLConnection(TimelineConnector.java:258)
	at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:159)
	at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147)
	... 14 more

{code}



> Timeline client won't be able to write when TimelineCollector is not up yet, or NM is
down
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-6767
>                 URL: https://issues.apache.org/jira/browse/YARN-6767
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineclient
>    Affects Versions: 3.0.0-alpha4
>            Reporter: Haibo Chen
>
> As discussed in the call, when an application first starts to run, its corresponding
TimelineCollector instance may not be up yet, or if the TimelineCollector goes down when node
manager dies (TimelineCollector now runs as part of NM auxiliary services), the timeline client
> will not able to write entities. We need to address or mitigate the issue if possible,
or at least call it out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message