mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James DeFelice (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MESOS-7752) Command executor still active after terminal task state update.
Date Tue, 11 Jul 2017 13:58:00 GMT

     [ https://issues.apache.org/jira/browse/MESOS-7752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

James DeFelice updated MESOS-7752:
----------------------------------
    Labels: mesosphere  (was: )

> Command executor still active after terminal task state update.
> ---------------------------------------------------------------
>
>                 Key: MESOS-7752
>                 URL: https://issues.apache.org/jira/browse/MESOS-7752
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 1.3.0
>            Reporter: A. Dukhovniy
>              Labels: mesosphere
>
> Here is a rather simple scenario to reproduce this error:
> * Frameworks starts a task with taskId = _task1_
> * Framework kills _task1_ *successfully* and *acknowledges* TASK_KILLED
> * Framework starts another task with the same _task1_  but receives "_TASK_FAILED (Attempted
to run multiple tasks using a "command" executor)_"
> *Note*: this test is racy so this scenario fails occasionally.
> *Here is a full log* from that show a life-cycle of a task id _app-restart-resident-app-with-five-instances.8882bd16-5fdd-11e7-a00e-0242aceef95c_:
> {code:java}
> # Starting...
> WARN [10:51:14 ResidentTaskIntegrationTest-MesosMaster-32782] I0703 10:51:14.476085 14666
master.cpp:3352] Authorizing framework principal 'principal' to launch task app-restart-resident-app-with-five-instances.8882bd16-5fdd-11e7-a00e-0242aceef95c
> WARN [10:51:14 ResidentTaskIntegrationTest-MesosMaster-32782] I0703 10:51:14.510136 14666
master.cpp:4426] Launching task app-restart-resident-app-with-five-instances.8882bd16-5fdd-11e7-a00e-0242aceef95c
of framework 76d8f3e7-8f3a-4764-bb7d-2bcf8e85e2be-0000 (marathon) at scheduler-6dbbac16-7355-4a33-aee6-b9697c83e51c@127.0.1.1:61567
with resources...
> WARN [10:51:14 ResidentTaskIntegrationTest-MesosAgent-32788] I0703 10:51:14.513908 14697
slave.cpp:2118] Queued task 'app-restart-resident-app-with-five-instances.8882bd16-5fdd-11e7-a00e-0242aceef95c'
for executor 'app-restart-resident-app-with-five-instances.8882bd16-5fdd-11e7-a00e-0242aceef95c'
of framework 76d8f3e7-8f3a-4764-bb7d-2bcf8e85e2be-0000
> WARN [10:51:15 ResidentTaskIntegrationTest-MesosMaster-32782] I0703 10:51:15.011696 14671
master.cpp:6222] Forwarding status update TASK_RUNNING (UUID: ed2d0475-9d83-4e09-9f54-5b4d323e4558)
for task app-restart-resident-app-with-five-instances.8882bd16-5fdd-11e7-a00e-0242aceef95c
of framework 76d8f3e7-8f3a-4764-bb7d-2bcf8e85e2be-0000
> WARN [10:51:15 ResidentTaskIntegrationTest-MesosMaster-32782] I0703 10:51:15.036391 14671
master.cpp:5092] Processing ACKNOWLEDGE call ed2d0475-9d83-4e09-9f54-5b4d323e4558 for task
app-restart-resident-app-with-five-instances.8882bd16-5fdd-11e7-a00e-0242aceef95c of framework
76d8f3e7-8f3a-4764-bb7d-2bcf8e85e2be-0000 (marathon) at scheduler-6dbbac16-7355-4a33-aee6-b9697c83e51c@127.0.1.1:61567
on agent 76d8f3e7-8f3a-4764-bb7d-2bcf8e85e2be-S0
> {code}
> {code:java}
> # Killing...
> DEBUG[10:51:15 ResidentTaskIntegrationTest-LocalMarathon-32800] WARN [10:51:15 KillAction$]
Killing known task [app-restart-resident-app-with-five-instances.8882bd16-5fdd-11e7-a00e-0242aceef95c]
of instance instance [app-restart-resident-app-with-five-instances.marathon-8882bd16-5fdd-11e7-a00e-0242aceef95c]
> WARN [10:51:15 ResidentTaskIntegrationTest-MesosAgent-32788] I0703 10:51:15.196702 14697
slave.cpp:3816] Handling status update TASK_KILLED (UUID: f7e9d0bc-726c-43aa-9ddc-3b082a68642e)
for task app-restart-resident-app-with-five-instances.8882bd16-5fdd-11e7-a00e-0242aceef95c
of framework 76d8f3e7-8f3a-4764-bb7d-2bcf8e85e2be-0000 from executor(1)@172.16.10.121:35184
> WARN [10:51:15 ResidentTaskIntegrationTest-MesosAgent-32788] I0703 10:51:15.197676 14697
slave.cpp:4166] Sending acknowledgement for status update TASK_KILLED (UUID: f7e9d0bc-726c-43aa-9ddc-3b082a68642e)
for task app-restart-resident-app-with-five-instances.8882bd16-5fdd-11e7-a00e-0242aceef95c
of framework 76d8f3e7-8f3a-4764-bb7d-2bcf8e85e2be-0000 to executor(1)@172.16.10.121:35184
> WARN [10:51:15 ResidentTaskIntegrationTest-MesosMaster-32782] I0703 10:51:15.198299 14671
master.cpp:6154] Status update TASK_KILLED (UUID: f7e9d0bc-726c-43aa-9ddc-3b082a68642e) for
task app-restart-resident-app-with-five-instances.8882bd16-5fdd-11e7-a00e-0242aceef95c of
framework 76d8f3e7-8f3a-4764-bb7d-2bcf8e85e2be-0000 from agent 76d8f3e7-8f3a-4764-bb7d-2bcf8e85e2be-S0
at slave(1)@172.16.10.121:32788 (172.16.10.121)
> DEBUG[10:51:15 ResidentTaskIntegrationTest-LocalMarathon-32800] INFO [10:51:15 MarathonScheduler]
Received status update for task app-restart-resident-app-with-five-instances.8882bd16-5fdd-11e7-a00e-0242aceef95c:
TASK_KILLED (Command terminated with signal Terminated)
> WARN [10:51:15 ResidentTaskIntegrationTest-MesosMaster-32782] I0703 10:51:15.216081 14671
master.cpp:5092] Processing ACKNOWLEDGE call f7e9d0bc-726c-43aa-9ddc-3b082a68642e for task
app-restart-resident-app-with-five-instances.8882bd16-5fdd-11e7-a00e-0242aceef95c of framework
76d8f3e7-8f3a-4764-bb7d-2bcf8e85e2be-0000 (marathon) at scheduler-6dbbac16-7355-4a33-aee6-b9697c83e51c@127.0.1.1:61567
on agent 76d8f3e7-8f3a-4764-bb7d-2bcf8e85e2be-S0
> WARN [10:51:15 ResidentTaskIntegrationTest-MesosMaster-32782] I0703 10:51:15.216107 14671
master.cpp:8396] Removing task app-restart-resident-app-with-five-instances.8882bd16-5fdd-11e7-a00e-0242aceef95c
with resources...
> WARN [10:51:15 ResidentTaskIntegrationTest-MesosAgent-32788] I0703 10:51:15.216667 14697
status_update_manager.cpp:395] Received status update acknowledgement (UUID: f7e9d0bc-726c-43aa-9ddc-3b082a68642e)
for task app-restart-resident-app-with-five-instances.8882bd16-5fdd-11e7-a00e-0242aceef95c
of framework 76d8f3e7-8f3a-4764-bb7d-2bcf8e85e2be-0000
> WARN [10:51:15 ResidentTaskIntegrationTest-MesosAgent-32788] I0703 10:51:15.216722 14697
status_update_manager.cpp:832] Checkpointing ACK for status update TASK_KILLED (UUID: f7e9d0bc-726c-43aa-9ddc-3b082a68642e)
for task app-restart-resident-app-with-five-instances.8882bd16-5fdd-11e7-a00e-0242aceef95c
of framework 76d8f3e7-8f3a-4764-bb7d-2bcf8e85e2be-0000
> {code}
> {code:java}
> # and starting again:
> WARN [10:51:15 ResidentTaskIntegrationTest-MesosMaster-32782] I0703 10:51:15.247561 14671
master.cpp:3352] Authorizing framework principal 'principal' to launch task app-restart-resident-app-with-five-instances.8882bd16-5fdd-11e7-a00e-0242aceef95c
> WARN [10:51:15 ResidentTaskIntegrationTest-MesosAgent-32788] I0703 10:51:15.252348 14697
slave.cpp:1625] Got assigned task 'app-restart-resident-app-with-five-instances.8882bd16-5fdd-11e7-a00e-0242aceef95c'
for framework 76d8f3e7-8f3a-4764-bb7d-2bcf8e85e2be-0000
> WARN [10:51:15 ResidentTaskIntegrationTest-MesosAgent-32788] I0703 10:51:15.252707 14697
slave.cpp:1785] Launching task 'app-restart-resident-app-with-five-instances.8882bd16-5fdd-11e7-a00e-0242aceef95c'
for framework 76d8f3e7-8f3a-4764-bb7d-2bcf8e85e2be-0000
> WARN [10:51:15 ResidentTaskIntegrationTest-MesosAgent-32788] I0703 10:51:15.253159 14697
slave.cpp:2140] Queued task 'app-restart-resident-app-with-five-instances.8882bd16-5fdd-11e7-a00e-0242aceef95c'
for executor 'app-restart-resident-app-with-five-instances.8882bd16-5fdd-11e7-a00e-0242aceef95c'
of framework 76d8f3e7-8f3a-4764-bb7d-2bcf8e85e2be-0000 at executor(1)@172.16.10.121:35184
> DEBUG[10:51:15 ResidentTaskIntegrationTest-LocalMarathon-32800] INFO [10:51:15 MarathonScheduler]
Received status update for task app-restart-resident-app-with-five-instances.8882bd16-5fdd-11e7-a00e-0242aceef95c:
TASK_FAILED (Attempted to run multiple tasks using a "command" executor)
> {code}
> *TL;DR*: framework receives and acknowledges _TASK_KILLED_ status but fails to re-start
the task because _"Attempted to run multiple tasks using a "command" executor"_
> Though reusing task Ids is discouraged
> {code:java}
> /**
>  * A framework-generated ID to distinguish a task. The ID must remain
>  * unique while the task is active. A framework can reuse an ID _only_
>  * if the previous task with the same ID has reached a terminal state
>  * (e.g., TASK_FINISHED, TASK_LOST, TASK_KILLED, etc.). However,
>  * reusing task IDs is strongly discouraged (MESOS-2198).
>  */{code}
> it is acceptable after receiving a terminal tasks status which happened above.
> *Possible cause*:
> I assume that occasionally the executor is not yet cleaned and is reused during task
restart. This however fails here: https://github.com/apache/mesos/blob/35dd2b600b8af0204d03c4ee5348a1a6672b136c/src/launcher/executor.cpp#L512
> /cc [~tillt]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message