mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jie Yu (JIRA)" <>
Subject [jira] [Commented] (MESOS-6743) Docker executor hangs forever if `docker stop` fails.
Date Thu, 09 Nov 2017 00:00:09 GMT


Jie Yu commented on MESOS-6743:

Does this address the issue of daemon daemon hanging? i.e., MESOS-5722

If not, we should either reopen this ticket, or reopen MESOS-5722.

> Docker executor hangs forever if `docker stop` fails.
> -----------------------------------------------------
>                 Key: MESOS-6743
>                 URL:
>             Project: Mesos
>          Issue Type: Bug
>          Components: docker
>    Affects Versions: 1.0.1, 1.1.0, 1.2.1, 1.3.0
>            Reporter: Alexander Rukletsov
>            Assignee: Andrei Budnik
>            Priority: Critical
>              Labels: mesosphere, reliability
>             Fix For: 1.1.3, 1.2.3, 1.3.2, 1.4.0
> If {{docker stop}} finishes with an error status, the executor should catch this and
react instead of indefinitely waiting for {{reaped}} to return.
> An interesting question is _how_ to react. Here are possible solutions.
> 1. Retry {{docker stop}}. In this case it is unclear how many times to retry and what
to do if {{docker stop}} continues to fail.
> 2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. However,
in this case it is unclear what status updates we should send: {{TASK_KILLING}} for every
kill retry? an extra update when we failed to kill a task? or set a specific reason in {{TASK_KILLING}}?
> 3. Clean up and exit. In this case we should make sure the task container is killed or
notify the framework and the operator that the container may still be running.

This message was sent by Atlassian JIRA

View raw message