mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Mann (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MESOS-8538) Consider adding a timeout to Docker executor task launch
Date Fri, 02 Feb 2018 22:06:00 GMT
Greg Mann created MESOS-8538:
--------------------------------

             Summary: Consider adding a timeout to Docker executor task launch
                 Key: MESOS-8538
                 URL: https://issues.apache.org/jira/browse/MESOS-8538
             Project: Mesos
          Issue Type: Improvement
            Reporter: Greg Mann


In order to be more resilient to an unresponsive Docker daemon on an agent, the Docker executor
could utilize a timeout for its task launches. If its initial {{docker inspect}} call fails
to return within this timeout, the executor could commit suicide.

However, we must be careful to properly clean up in such a case. For example, if the executor's
{{docker run}} command was successful, but then {{docker inspect}} failed to return, we would
want to be sure that the Docker containerizer would destroy the running container in this
case. Furthermore, it's possible that it could lead to a state where the executor terminates,
then a TASK_FAILED is forwarded to the master, but the task container continues to run on
the agent until the daemon becomes responsive again. If a launch timeout is implemented, care
should be taken to avoid such inconsistent states.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message