mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Bannier (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MESOS-6743) Docker executor hangs forever if `docker stop` fails.
Date Fri, 09 Dec 2016 10:17:59 GMT

     [ https://issues.apache.org/jira/browse/MESOS-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Benjamin Bannier updated MESOS-6743:
------------------------------------
    Description: 
If {{docker stop}} finishes with an error status, the executor should catch this and react
instead of indefinitely waiting for {{reaped}} to return.

An interesting question is _how_ to react. Here are possible solutions.

1. Retry {{docker stop}}. In this case it is unclear how many times to retry and what to do
if {{docker stop}} continues to fail.

2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. However, in this
case it is unclear what status updates we should send: {{TASK_KILLING}} for every kill retry?
an extra update when we failed to kill a task? or set a specific reason in {{TASK_KILLING}}?

3. Clean up and exit. In this case we should make sure the task container is killed or notify
the framework and the operator that the container may still be running.

  was:
If {{docker stop}} finishes with an error status, the executor should catch this and react
instead of indefinitely waiting for {{reaped}} to return.

An interesting question is _how_ to react. Here are possible solutions.

1. Retry {{docker stop}}. In this case it is unclear how many times to retry and what to do
if {{docker stop}} continues to fail.

2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. However, in this
case it is unclear what status updates we should send: {TASK_KILLING}} for every kill retry?
an extra update when we failed to kill a task? or set a specific reason in {{TASK_KILLING}}?

3. Clean up and exit. In this case we should make sure the task container is killed or notify
the framework and the operator that the container may still be running.


> Docker executor hangs forever if `docker stop` fails.
> -----------------------------------------------------
>
>                 Key: MESOS-6743
>                 URL: https://issues.apache.org/jira/browse/MESOS-6743
>             Project: Mesos
>          Issue Type: Bug
>          Components: docker
>    Affects Versions: 1.0.1, 1.1.0
>            Reporter: Alexander Rukletsov
>              Labels: mesosphere
>
> If {{docker stop}} finishes with an error status, the executor should catch this and
react instead of indefinitely waiting for {{reaped}} to return.
> An interesting question is _how_ to react. Here are possible solutions.
> 1. Retry {{docker stop}}. In this case it is unclear how many times to retry and what
to do if {{docker stop}} continues to fail.
> 2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. However,
in this case it is unclear what status updates we should send: {{TASK_KILLING}} for every
kill retry? an extra update when we failed to kill a task? or set a specific reason in {{TASK_KILLING}}?
> 3. Clean up and exit. In this case we should make sure the task container is killed or
notify the framework and the operator that the container may still be running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message