mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Schwartzmeyer (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MESOS-8488) Docker bug can cause unkillable tasks.
Date Sat, 03 Mar 2018 02:18:00 GMT

    [ https://issues.apache.org/jira/browse/MESOS-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16384432#comment-16384432
] 

Andrew Schwartzmeyer edited comment on MESOS-8488 at 3/3/18 2:17 AM:
---------------------------------------------------------------------

Commit 1daf6cb03
Author: Akash Gupta akash-gupta@hotmail.com
Date:   Sun Feb 25 13:37:42 2018 -0800
Windows: Fixed flaky Docker command health check test.

The `DockerContainerizerHealthCheckTest.ROOT_DOCKER_
DockerHealthStatusChange` test was flaky on Windows, because
the Docker executor manually reaps the container exit code in
case that `docker run` fails to get the exit code. This logic
doesn't work on Windows, since the process might not be visible to
the container host machine, causing `TASK_FAILED` to get sent. By
removing the reaping logic on Windows, the test is much more reliable.

Review: https://reviews.apache.org/r/65733/


was (Author: andschwa):
{noformat}
commit 1daf6cb03
Author: Akash Gupta akash-gupta@hotmail.com
Date:   Sun Feb 25 13:37:42 2018 -0800
Windows: Fixed flaky Docker command health check test.

The `DockerContainerizerHealthCheckTest.ROOT_DOCKER_
DockerHealthStatusChange` test was flaky on Windows, because
the Docker executor manually reaps the container exit code in
case that `docker run` fails to get the exit code. This logic
doesn't work on Windows, since the process might not be visible to
the container host machine, causing `TASK_FAILED` to get sent. By
removing the reaping logic on Windows, the test is much more reliable.

Review: https://reviews.apache.org/r/65733/
{noformat}

> Docker bug can cause unkillable tasks.
> --------------------------------------
>
>                 Key: MESOS-8488
>                 URL: https://issues.apache.org/jira/browse/MESOS-8488
>             Project: Mesos
>          Issue Type: Improvement
>          Components: containerization
>    Affects Versions: 1.5.0
>            Reporter: Greg Mann
>            Assignee: Qian Zhang
>            Priority: Major
>              Labels: mesosphere
>             Fix For: 1.6.0
>
>
> Due to an [issue on the Moby project|https://github.com/moby/moby/issues/33820], it's
possible for Docker versions 1.13 and later to fail to catch a container exit, so that the
{{docker run}} command which was used to launch the container will never return. This can
lead to the Docker executor becoming stuck in a state where it believes the container is still
running and cannot be killed.
> We should update the Docker executor to ensure that containers stuck in such a state
cannot cause unkillable Docker executors/tasks.
> One way to do this would be a timeout, after which the Docker executor will commit suicide
if a kill task attempt has not succeeded. However, if we do this we should also ensure that
in the case that the container was actually still running, either the Docker daemon or the
DockerContainerizer would clean up the container when it does exit.
> Another option might be for the Docker executor to directly {{wait()}} on the container's
Linux PID, in order to notice when the container exits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message