mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Mahler (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task
Date Mon, 07 Aug 2017 23:54:01 GMT

     [ https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Benjamin Mahler reassigned MESOS-7744:
--------------------------------------

    Assignee: Benjamin Mahler

> Mesos Agent Sends TASK_KILL status update to Master, and still launches task
> ----------------------------------------------------------------------------
>
>                 Key: MESOS-7744
>                 URL: https://issues.apache.org/jira/browse/MESOS-7744
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 1.0.1
>            Reporter: Sargun Dhillon
>            Assignee: Benjamin Mahler
>            Priority: Critical
>              Labels: reliability
>
> We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a TASK_STARTING
back from the agent. Under certain conditions it can result in Mesos losing track of the task.
The chunk of the logs which is interesting is here:
> {code}
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]:
I0629 23:22:26.951799  5171 slave.cpp:1495] Got assigned task Titus-7590548-worker-0-4476
for framework TitusFramework
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]:
I0629 23:22:26.952251  5171 slave.cpp:1614] Launching task Titus-7590548-worker-0-4476 for
framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]:
I0629 23:22:37.484611  5171 slave.cpp:1853] Queuing task ‘Titus-7590548-worker-0-4476’
for executor ‘docker-executor’ of framework TitusFramework at executor(1)@100.66.11.10:17707
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]:
I0629 23:22:37.487876  5171 slave.cpp:2035] Asked to kill task Titus-7590548-worker-0-4476
of framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]:
I0629 23:22:37.488994  5171 slave.cpp:3211] Handling status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4)
for task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]:
I0629 23:22:37.490603  5171 slave.cpp:2005] Sending queued task ‘Titus-7590548-worker-0-4476’
to executor ‘docker-executor’ of framework TitusFramework at executor(1)@100.66.11.10:17707{
> {code}
> In our executor, we see that the launch message arrives after the master has already
gotten the kill update. We then send non-terminal state updates to the agent, and yet it doesn't
forward these to our framework. We're using a custom executor which is based on the older
mesos-go bindings. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message