hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shane Kumpf (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers
Date Mon, 26 Mar 2018 19:04:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414336#comment-16414336

Shane Kumpf commented on YARN-7973:

{quote}Sorry, I am not clear about the design of container relaunch feature. In what scenario
is container relaunch used?
Please see the existing {{ContainerRelaunch}} feature (YARN-3998) to better understand the
initial design. This JIRA is for properly handling that feature with the Docker runtime. The
{{ContainerRetryPolicy}} used by Native Services results in the use of this feature.
{quote}what would happen if the intermediate state of the container is preventing relaunch
to run successfully?
It is going to depend on your configuration. By default, Native Services relaunches every
30 seconds until the app lifetime is exceeded. This is the behavior with or without this patch.
With a retry count set, the container will fail after relaunching the specified number of

How relaunch is used, is up to the application/AM, so we can't just look at how Native Services
is using it, we need to fix relaunch for the Docker case.

As previously mentioned, IMO, we have two options:
 1) The approach taken here to call "docker start" on the existing container.
 2) Delete and launch a new Docker container with the same container ID name.

Given the design behind YARN-3998, #1 appears to be most appropriate. This may allow some
applications to recover existing data, which I believe to be desirable.

> Support ContainerRelaunch for Docker containers
> -----------------------------------------------
>                 Key: YARN-7973
>                 URL: https://issues.apache.org/jira/browse/YARN-7973
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Shane Kumpf
>            Assignee: Shane Kumpf
>            Priority: Major
>         Attachments: YARN-7973.001.patch, YARN-7973.002.patch
> Prior to YARN-5366, {{container-executor}} would remove the Docker container when it
exited. The removal is now handled by the {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is
intended to reuse the workdir from the previous attempt, and does not call {{cleanupContainer}} prior
to {{launchContainer}}. The container ID is reused as well. As a result, the previous Docker
container still exists, resulting in an error from Docker indicating the a container by that
name already exists.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message