hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shane Kumpf (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers
Date Fri, 30 Mar 2018 13:50:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420505#comment-16420505

Shane Kumpf commented on YARN-7973:

Thanks for trying out the patch [~eyang]!

{quote} Container relaunch is kind of working on my cluster using the example above.  If an
app is stopped, and restarted, new containers would be acquired.  If container fails, and
the same one will be used for relaunch. {quote}
So it seems that there may be inconsistent use of the container relaunch policy in Native
Services. That isn't really in scope for this patch, but sounds like something we should review
in a separate issue. The only change in flow is when a container transitions to the relaunching
state and Docker is in use, so this patch doesn't change how Native Services leverages that

{quote}However, I encountered a problem where flexing containers from 2 to 3, then decrease
back to 2.  The flexing command failed to be received by AM with the following error message{code}
I haven't been able to recreate this. Based on the exception type, it looks like the Services
API may have been down? Can you share the RM and NM logs when this happens? I really wouldn't
expect this patch to be related to that exception as it doesn't touch the Services API.

> Support ContainerRelaunch for Docker containers
> -----------------------------------------------
>                 Key: YARN-7973
>                 URL: https://issues.apache.org/jira/browse/YARN-7973
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Shane Kumpf
>            Assignee: Shane Kumpf
>            Priority: Major
>         Attachments: YARN-7973.001.patch, YARN-7973.002.patch
> Prior to YARN-5366, {{container-executor}} would remove the Docker container when it
exited. The removal is now handled by the {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is
intended to reuse the workdir from the previous attempt, and does not call {{cleanupContainer}} prior
to {{launchContainer}}. The container ID is reused as well. As a result, the previous Docker
container still exists, resulting in an error from Docker indicating the a container by that
name already exists.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message