mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Chernetsky (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MESOS-8391) Mesos agent doesn't notice that a pod task exits or crashes after the agent restart
Date Thu, 04 Jan 2018 18:04:01 GMT

     [ https://issues.apache.org/jira/browse/MESOS-8391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ivan Chernetsky updated MESOS-8391:
-----------------------------------
    Priority: Blocker  (was: Critical)

> Mesos agent doesn't notice that a pod task exits or crashes after the agent restart
> -----------------------------------------------------------------------------------
>
>                 Key: MESOS-8391
>                 URL: https://issues.apache.org/jira/browse/MESOS-8391
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent, containerization, executor
>    Affects Versions: 1.5.0
>            Reporter: Ivan Chernetsky
>            Priority: Blocker
>         Attachments: agent.log.gz
>
>
> h4. (1) Agent doesn't detect that a pod task exits/crashes
> # Create a Marathon pod with two containers which just do {{sleep 10000}}.
> # Restart the Mesos agent on the node the pod got launched.
> # Kill one of the pod tasks
> *Expected result*: The Mesos agent detects that one of the tasks got killed, and forwards
{{TASK_FAILED}} status to Marathon.
> *Actual result*: The Mesos agent does nothing, and the Mesos master thinks that both
tasks are running just fine. Marathon doesn't take any action because it doesn't receive any
update from Mesos.
> h4. (2) After the agent restart, it detects that the task crashed, forwards the correct
status update, but the other task stays in {{TASK_KILLING}} state forever
> # Perform steps in (1).
> # Restart the Mesos agent
> *Expected result*: The Mesos agent detects that one of the tasks got crashed, forwards
the corresponding status update, and kills the other task too.
> *Actual result*: The Mesos agent detects that one of the tasks got crashed, forwards
the corresponding status update, but the other task stays in `TASK_KILLING` state forever.
> Please note, that after another agent restart, the other tasks gets finally killed and
the correct status updates get propagated all the way to Marathon.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message