mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Rukletsov (JIRA)" <>
Subject [jira] [Updated] (MESOS-6743) Docker executor hangs forever if `docker stop` fails.
Date Thu, 13 Jul 2017 08:13:00 GMT


Alexander Rukletsov updated MESOS-6743:
    Sprint: Mesosphere Sprint 60
    Labels: mesosphere reliability  (was: mesosphere)

> Docker executor hangs forever if `docker stop` fails.
> -----------------------------------------------------
>                 Key: MESOS-6743
>                 URL:
>             Project: Mesos
>          Issue Type: Bug
>          Components: docker
>    Affects Versions: 1.0.1, 1.1.0, 1.2.1, 1.3.0
>            Reporter: Alexander Rukletsov
>            Priority: Critical
>              Labels: mesosphere, reliability
> If {{docker stop}} finishes with an error status, the executor should catch this and
react instead of indefinitely waiting for {{reaped}} to return.
> An interesting question is _how_ to react. Here are possible solutions.
> 1. Retry {{docker stop}}. In this case it is unclear how many times to retry and what
to do if {{docker stop}} continues to fail.
> 2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. However,
in this case it is unclear what status updates we should send: {{TASK_KILLING}} for every
kill retry? an extra update when we failed to kill a task? or set a specific reason in {{TASK_KILLING}}?
> 3. Clean up and exit. In this case we should make sure the task container is killed or
notify the framework and the operator that the container may still be running.

This message was sent by Atlassian JIRA

View raw message