hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billie Rinaldi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers
Date Fri, 09 Mar 2018 18:37:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16393346#comment-16393346
] 

Billie Rinaldi commented on YARN-7973:
--------------------------------------

I started taking a look at patch 002. When I ran my first app, I had a configuration problem:
I was trying to run a privileged container as a user that wasn't allowed to run privileged
containers. The container failed with the appropriate message about the user failing the ACL
check, but when it was relaunched the following was logged repeatedly. It seems like we could
improve the failure handling in scenarios like this.
{noformat}
2018-03-08 22:02:53,791 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Getting container-status for container_1520546307703_0001_01_000002
2018-03-08 22:02:53,791 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Returning ContainerStatus: [ContainerId: container_1520546307703_0001_01_000002, ExecutionType:
GUARANTEED, State: RUNNING, Capability: <memory:1024, vCores:1>, Diagnostics: [2018-03-08
22:02:53.397]Exception from container-launch.
Container id: container_1520546307703_0001_01_000002
Exit code: -1
Exception message: <unknown>
Shell output: <unknown>

[2018-03-08 22:02:53.500]Diagnostic message from attempt 0 : [2018-03-08 22:02:53.500]
[2018-03-08 22:02:53.501]Container exited with a non-zero exit code -1.
, ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED]
{noformat}

> Support ContainerRelaunch for Docker containers
> -----------------------------------------------
>
>                 Key: YARN-7973
>                 URL: https://issues.apache.org/jira/browse/YARN-7973
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Shane Kumpf
>            Assignee: Shane Kumpf
>            Priority: Major
>         Attachments: YARN-7973.001.patch, YARN-7973.002.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container when it
exited. The removal is now handled by the {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is
intended to reuse the workdir from the previous attempt, and does not call {{cleanupContainer}} prior
to {{launchContainer}}. The container ID is reused as well. As a result, the previous Docker
container still exists, resulting in an error from Docker indicating the a container by that
name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message