hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shane Kumpf (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers
Date Mon, 26 Mar 2018 19:04:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414336#comment-16414336
] 

Shane Kumpf commented on YARN-7973:
-----------------------------------

{quote}Sorry, I am not clear about the design of container relaunch feature. In what scenario
is container relaunch used?
{quote}
Please see the existing {{ContainerRelaunch}} feature (YARN-3998) to better understand the
initial design. This JIRA is for properly handling that feature with the Docker runtime. The
{{ContainerRetryPolicy}} used by Native Services results in the use of this feature.
{quote}what would happen if the intermediate state of the container is preventing relaunch
to run successfully?
{quote}
It is going to depend on your configuration. By default, Native Services relaunches every
30 seconds until the app lifetime is exceeded. This is the behavior with or without this patch.
With a retry count set, the container will fail after relaunching the specified number of
times.

How relaunch is used, is up to the application/AM, so we can't just look at how Native Services
is using it, we need to fix relaunch for the Docker case.

As previously mentioned, IMO, we have two options:
 1) The approach taken here to call "docker start" on the existing container.
 2) Delete and launch a new Docker container with the same container ID name.

Given the design behind YARN-3998, #1 appears to be most appropriate. This may allow some
applications to recover existing data, which I believe to be desirable.

> Support ContainerRelaunch for Docker containers
> -----------------------------------------------
>
>                 Key: YARN-7973
>                 URL: https://issues.apache.org/jira/browse/YARN-7973
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Shane Kumpf
>            Assignee: Shane Kumpf
>            Priority: Major
>         Attachments: YARN-7973.001.patch, YARN-7973.002.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container when it
exited. The removal is now handled by the {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is
intended to reuse the workdir from the previous attempt, and does not call {{cleanupContainer}} prior
to {{launchContainer}}. The container ID is reused as well. As a result, the previous Docker
container still exists, resulting in an error from Docker indicating the a container by that
name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message