hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers
Date Mon, 26 Mar 2018 21:03:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414543#comment-16414543
] 

Eric Yang commented on YARN-7973:
---------------------------------

[~shanekumpf@gmail.com] Thank you for the example.  Container relaunch is kind of working
on my cluster using the example above.  If an app is stopped, and restarted, new containers
would be acquired.  If container fails, and the same one will be used for relaunch.  However,
I encountered a problem where flexing containers from 2 to 3, then decrease back to 2.  The
flexing command failed to be received by AM with the following error message:

{code}
[hbase@eyang-5 hadoop-3.2.0-SNAPSHOT]$ ./bin/yarn app -flex z1 -component ping 2
2018-03-26 20:37:22,968 ERROR client.ApiServiceClient: Fail to flex application: 
com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused
(Connection refused)
	at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
	at com.sun.jersey.api.client.Client.handle(Client.java:652)
	at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
	at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
	at com.sun.jersey.api.client.WebResource$Builder.put(WebResource.java:539)
	at org.apache.hadoop.yarn.service.client.ApiServiceClient.actionFlex(ApiServiceClient.java:417)
	at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:519)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
	at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:111)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at java.net.Socket.connect(Socket.java:538)
	at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
	at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
	at sun.net.www.http.HttpClient.New(HttpClient.java:339)
	at sun.net.www.http.HttpClient.New(HttpClient.java:357)
	at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
	at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966)
	at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1316)
	at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1291)
	at com.sun.jersey.client.urlconnection.URLConnectionClientHandler$1$1.getOutputStream(URLConnectionClientHandler.java:238)
	at com.sun.jersey.api.client.CommittingOutputStream.commitStream(CommittingOutputStream.java:117)
	at com.sun.jersey.api.client.CommittingOutputStream.write(CommittingOutputStream.java:89)
	at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
	at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
	at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
	at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
	at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
	at java.io.BufferedWriter.flush(BufferedWriter.java:254)
	at com.sun.jersey.core.util.ReaderWriter.writeToAsString(ReaderWriter.java:191)
	at com.sun.jersey.core.provider.AbstractMessageReaderWriterProvider.writeToAsString(AbstractMessageReaderWriterProvider.java:128)
	at com.sun.jersey.core.impl.provider.entity.StringProvider.writeTo(StringProvider.java:88)
	at com.sun.jersey.core.impl.provider.entity.StringProvider.writeTo(StringProvider.java:58)
	at com.sun.jersey.api.client.RequestWriter.writeRequestEntity(RequestWriter.java:300)
	at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:217)
	at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
	... 9 more
{code}

There is no error in AM logs.  The most recent logs are:

{code}
2018-03-26 20:43:32,061 [pool-5-thread-3] INFO  instance.ComponentInstance - [COMPINSTANCE
ping-0 : container_1522094540915_0004_01_000014] IP = [172.26.111.20], host = ping-0.z1.hbase.ycluster,
cancel container status retriever
2018-03-26 20:43:54,186 [pool-7-thread-1] INFO  component.Component - [COMPONENT ping] state
changed from FLEXING -> STABLE
2018-03-26 20:43:54,187 [pool-7-thread-1] INFO  service.ServiceMaster - Service state changed
from STARTED -> STABLE
2018-03-26 20:43:54,187 [pool-7-thread-1] INFO  instance.ComponentInstance - [COMPINSTANCE
ping-0 : container_1522094540915_0004_01_000014] Transitioned from STARTED to READY on BECOME_READY
event
{code}

The same commands works without this patch.

> Support ContainerRelaunch for Docker containers
> -----------------------------------------------
>
>                 Key: YARN-7973
>                 URL: https://issues.apache.org/jira/browse/YARN-7973
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Shane Kumpf
>            Assignee: Shane Kumpf
>            Priority: Major
>         Attachments: YARN-7973.001.patch, YARN-7973.002.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container when it
exited. The removal is now handled by the {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is
intended to reuse the workdir from the previous attempt, and does not call {{cleanupContainer}} prior
to {{launchContainer}}. The container ID is reused as well. As a result, the previous Docker
container still exists, resulting in an error from Docker indicating the a container by that
name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message