Mailing-List: contact issues-help@mesos.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@mesos.apache.org
Date: Fri, 9 Dec 2016 10:17:59 +0000 (UTC)
From: "Benjamin Bannier (JIRA)" <jira@apache.org>
To: issues@mesos.apache.org
Message-ID: <JIRA.13026376.1481123596000.482036.1481278679588@Atlassian.JIRA>
In-Reply-To: <JIRA.13026376.1481123596000@Atlassian.JIRA>
References: <JIRA.13026376.1481123596000@Atlassian.JIRA> <JIRA.13026376.1481123596540@arcas>
Subject: [jira] [Updated] (MESOS-6743) Docker executor hangs forever if
 `docker stop` fails.
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Fri, 09 Dec 2016 10:18:01 -0000


     [ https://issues.apache.org/jira/browse/MESOS-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Bannier updated MESOS-6743:
------------------------------------
    Description: 
If {{docker stop}} finishes with an error status, the executor should catch this and react instead of indefinitely waiting for {{reaped}} to return.

An interesting question is _how_ to react. Here are possible solutions.

1. Retry {{docker stop}}. In this case it is unclear how many times to retry and what to do if {{docker stop}} continues to fail.

2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. However, in this case it is unclear what status updates we should send: {{TASK_KILLING}} for every kill retry? an extra update when we failed to kill a task? or set a specific reason in {{TASK_KILLING}}?

3. Clean up and exit. In this case we should make sure the task container is killed or notify the framework and the operator that the container may still be running.

  was:
If {{docker stop}} finishes with an error status, the executor should catch this and react instead of indefinitely waiting for {{reaped}} to return.

An interesting question is _how_ to react. Here are possible solutions.

1. Retry {{docker stop}}. In this case it is unclear how many times to retry and what to do if {{docker stop}} continues to fail.

2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. However, in this case it is unclear what status updates we should send: {TASK_KILLING}} for every kill retry? an extra update when we failed to kill a task? or set a specific reason in {{TASK_KILLING}}?

3. Clean up and exit. In this case we should make sure the task container is killed or notify the framework and the operator that the container may still be running.


> Docker executor hangs forever if `docker stop` fails.
> -----------------------------------------------------
>
>                 Key: MESOS-6743
>                 URL: https://issues.apache.org/jira/browse/MESOS-6743
>             Project: Mesos
>          Issue Type: Bug
>          Components: docker
>    Affects Versions: 1.0.1, 1.1.0
>            Reporter: Alexander Rukletsov
>              Labels: mesosphere
>
> If {{docker stop}} finishes with an error status, the executor should catch this and react instead of indefinitely waiting for {{reaped}} to return.
> An interesting question is _how_ to react. Here are possible solutions.
> 1. Retry {{docker stop}}. In this case it is unclear how many times to retry and what to do if {{docker stop}} continues to fail.
> 2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. However, in this case it is unclear what status updates we should send: {{TASK_KILLING}} for every kill retry? an extra update when we failed to kill a task? or set a specific reason in {{TASK_KILLING}}?
> 3. Clean up and exit. In this case we should make sure the task container is killed or notify the framework and the operator that the container may still be running.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)