mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sathish Kumar (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated
Date Fri, 20 Jan 2017 06:59:26 GMT

     [ https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sathish Kumar updated MESOS-6952:
---------------------------------
    Description: 
Task was stuck at staging state almost 6hours even after slave executor is terminated on the
slave. Since the task was stuck at staging, framework have not received update from mesos-master.

 The issue got fixed after slave restart and the task was moved from staging to task lost
state.

I can see in the slave logs Asked to run task ' which is terminating/terminated
{noformat}
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for status update
TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097193 107774 slave.cpp:1361] Got assigned task ct:1484816820000:0:foocare_zendesk_round_robin:
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097453 107774 slave.cpp:1480] Launching task ct:1484816820000:0:foocare_zendesk_round_robin:
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119
14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:1484816820000:0:foocare_zendesk_round_robin:'
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:1484816820000:0:foocare_zendesk_round_robin:'
which is terminating/terminated
{noformat}

full Log of slave
{noformat}
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932)
for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
from executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.134692 107766 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID:
5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE for status update
TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932)
for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
to master@10.14.23.181:5050
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status update TASK_FAILED
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.226682 107761 status_update_manager.cpp:392] Received status update acknowledgement
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for status update
TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.858510 107759 slave.cpp:1361] Got assigned task ct:1484816820000:0:foocare_zendesk_round_robin:
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.858762 107759 slave.cpp:1480] Launching task ct:1484816820000:0:foocare_zendesk_round_robin:
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.859004 107759 slave.cpp:1711] Queuing task 'ct:1484816820000:0:foocare_zendesk_round_robin:'
for executor 'ct:1484816820000:0:foocare_zendesk_round_robin:' of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
at executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.939483 107759 slave.cpp:1863] Sending queued task 'ct:1484816820000:0:foocare_zendesk_round_robin:'
to executor 'ct:1484816820000:0:foocare_zendesk_round_robin:' of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
at executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:16.141394 107762 slave.cpp:3871] Executor 'ct:1484816820000:0:foocare_zendesk_round_robin:'
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 exited with status 0
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:16.141451 107762 slave.cpp:3012] Handling status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee)
for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
from @0.0.0.0:0
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:16.141849 107762 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID:
247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:16.141989 107762 status_update_manager.cpp:824] Checkpointing UPDATE for status update
TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:16.147343 107766 slave.cpp:3410] Forwarding the update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee)
for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
to master@10.14.23.181:5050
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.089175 107759 status_update_manager.cpp:392] Received status update acknowledgement
(UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for status update
TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097193 107774 slave.cpp:1361] Got assigned task ct:1484816820000:0:foocare_zendesk_round_robin:
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097453 107774 slave.cpp:1480] Launching task ct:1484816820000:0:foocare_zendesk_round_robin:
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119
14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:1484816820000:0:foocare_zendesk_round_robin:'
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:1484816820000:0:foocare_zendesk_round_robin:'
which is terminating/terminated
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097568 107774 slave.cpp:3012] Handling status update TASK_LOST (UUID: b999fb64-34f0-496d-be19-f5a7f998230e)
for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
from @0.0.0.0:0
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097633 107774 slave.cpp:3975] Cleaning up executor 'ct:1484816820000:0:foocare_zendesk_round_robin:'
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 at executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097790 107772 gc.cpp:55] Scheduling '/data/mesos/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:/runs/6b8922ff-3f57-42a0-97d1-d79c1de3d93b'
for gc 6.99999886874074days in the future
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097836 107772 gc.cpp:55] Scheduling '/data/mesos/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:'
for gc 6.99999886832296days in the future
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097869 107772 gc.cpp:55] Scheduling '/data/mesos/meta/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:/runs/6b8922ff-3f57-42a0-97d1-d79c1de3d93b'
for gc 6.99999886819259days in the future
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097888 107772 gc.cpp:55] Scheduling '/data/mesos/meta/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:'
for gc 6.99999886809185days in the future
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.WARNING.20161004-154318.107733:W0119
14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:1484816820000:0:foocare_zendesk_round_robin:'
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:1484816820000:0:foocare_zendesk_round_robin:'
which is terminating/terminated
{noformat}



  was:
Task was stuck at staging state almost 6hours even after slave executor is terminated on the
slave. Since the task was stuck at staging, framework have not received update from mesos-master.

 The issue got fixed after slave restart and the task was removed from staging to task lost
state.

I can see in the slave logs Asked to run task ' which is terminating/terminated
{noformat}
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for status update
TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097193 107774 slave.cpp:1361] Got assigned task ct:1484816820000:0:foocare_zendesk_round_robin:
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097453 107774 slave.cpp:1480] Launching task ct:1484816820000:0:foocare_zendesk_round_robin:
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119
14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:1484816820000:0:foocare_zendesk_round_robin:'
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:1484816820000:0:foocare_zendesk_round_robin:'
which is terminating/terminated
{noformat}

full Log of slave
{noformat}
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932)
for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
from executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.134692 107766 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID:
5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE for status update
TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932)
for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
to master@10.14.23.181:5050
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status update TASK_FAILED
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.226682 107761 status_update_manager.cpp:392] Received status update acknowledgement
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for status update
TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.858510 107759 slave.cpp:1361] Got assigned task ct:1484816820000:0:foocare_zendesk_round_robin:
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.858762 107759 slave.cpp:1480] Launching task ct:1484816820000:0:foocare_zendesk_round_robin:
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.859004 107759 slave.cpp:1711] Queuing task 'ct:1484816820000:0:foocare_zendesk_round_robin:'
for executor 'ct:1484816820000:0:foocare_zendesk_round_robin:' of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
at executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.939483 107759 slave.cpp:1863] Sending queued task 'ct:1484816820000:0:foocare_zendesk_round_robin:'
to executor 'ct:1484816820000:0:foocare_zendesk_round_robin:' of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
at executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:16.141394 107762 slave.cpp:3871] Executor 'ct:1484816820000:0:foocare_zendesk_round_robin:'
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 exited with status 0
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:16.141451 107762 slave.cpp:3012] Handling status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee)
for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
from @0.0.0.0:0
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:16.141849 107762 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID:
247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:16.141989 107762 status_update_manager.cpp:824] Checkpointing UPDATE for status update
TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:16.147343 107766 slave.cpp:3410] Forwarding the update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee)
for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
to master@10.14.23.181:5050
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.089175 107759 status_update_manager.cpp:392] Received status update acknowledgement
(UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for status update
TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097193 107774 slave.cpp:1361] Got assigned task ct:1484816820000:0:foocare_zendesk_round_robin:
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097453 107774 slave.cpp:1480] Launching task ct:1484816820000:0:foocare_zendesk_round_robin:
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119
14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:1484816820000:0:foocare_zendesk_round_robin:'
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:1484816820000:0:foocare_zendesk_round_robin:'
which is terminating/terminated
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097568 107774 slave.cpp:3012] Handling status update TASK_LOST (UUID: b999fb64-34f0-496d-be19-f5a7f998230e)
for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
from @0.0.0.0:0
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097633 107774 slave.cpp:3975] Cleaning up executor 'ct:1484816820000:0:foocare_zendesk_round_robin:'
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 at executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097790 107772 gc.cpp:55] Scheduling '/data/mesos/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:/runs/6b8922ff-3f57-42a0-97d1-d79c1de3d93b'
for gc 6.99999886874074days in the future
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097836 107772 gc.cpp:55] Scheduling '/data/mesos/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:'
for gc 6.99999886832296days in the future
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097869 107772 gc.cpp:55] Scheduling '/data/mesos/meta/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:/runs/6b8922ff-3f57-42a0-97d1-d79c1de3d93b'
for gc 6.99999886819259days in the future
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097888 107772 gc.cpp:55] Scheduling '/data/mesos/meta/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:'
for gc 6.99999886809185days in the future
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.WARNING.20161004-154318.107733:W0119
14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:1484816820000:0:foocare_zendesk_round_robin:'
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:1484816820000:0:foocare_zendesk_round_robin:'
which is terminating/terminated
{noformat}




> Mesos task state was stuck in staging even after executor terminated
> --------------------------------------------------------------------
>
>                 Key: MESOS-6952
>                 URL: https://issues.apache.org/jira/browse/MESOS-6952
>             Project: Mesos
>          Issue Type: Bug
>          Components: executor
>    Affects Versions: 0.28.2
>         Environment: ubuntu 14.04
>            Reporter: Sathish Kumar
>
> Task was stuck at staging state almost 6hours even after slave executor is terminated
on the slave. Since the task was stuck at staging, framework have not received update from
mesos-master.
>  The issue got fixed after slave restart and the task was moved from staging to task
lost state.
> I can see in the slave logs Asked to run task ' which is terminating/terminated
> {noformat}
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for status update
TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097193 107774 slave.cpp:1361] Got assigned task ct:1484816820000:0:foocare_zendesk_round_robin:
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097453 107774 slave.cpp:1480] Launching task ct:1484816820000:0:foocare_zendesk_round_robin:
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119
14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:1484816820000:0:foocare_zendesk_round_robin:'
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:1484816820000:0:foocare_zendesk_round_robin:'
which is terminating/terminated
> {noformat}
> full Log of slave
> {noformat}
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932)
for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
from executor(1)@10.14.38.239:43937
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.134692 107766 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID:
5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE for status update
TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932)
for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
to master@10.14.23.181:5050
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status update TASK_FAILED
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.226682 107761 status_update_manager.cpp:392] Received status update acknowledgement
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for status update
TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.858510 107759 slave.cpp:1361] Got assigned task ct:1484816820000:0:foocare_zendesk_round_robin:
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.858762 107759 slave.cpp:1480] Launching task ct:1484816820000:0:foocare_zendesk_round_robin:
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.859004 107759 slave.cpp:1711] Queuing task 'ct:1484816820000:0:foocare_zendesk_round_robin:'
for executor 'ct:1484816820000:0:foocare_zendesk_round_robin:' of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
at executor(1)@10.14.38.239:43937
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:15.939483 107759 slave.cpp:1863] Sending queued task 'ct:1484816820000:0:foocare_zendesk_round_robin:'
to executor 'ct:1484816820000:0:foocare_zendesk_round_robin:' of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
at executor(1)@10.14.38.239:43937
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:16.141394 107762 slave.cpp:3871] Executor 'ct:1484816820000:0:foocare_zendesk_round_robin:'
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 exited with status 0
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:16.141451 107762 slave.cpp:3012] Handling status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee)
for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
from @0.0.0.0:0
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:16.141849 107762 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID:
247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:16.141989 107762 status_update_manager.cpp:824] Checkpointing UPDATE for status update
TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:16.147343 107766 slave.cpp:3410] Forwarding the update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee)
for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
to master@10.14.23.181:5050
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.089175 107759 status_update_manager.cpp:392] Received status update acknowledgement
(UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for status update
TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin:
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097193 107774 slave.cpp:1361] Got assigned task ct:1484816820000:0:foocare_zendesk_round_robin:
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097453 107774 slave.cpp:1480] Launching task ct:1484816820000:0:foocare_zendesk_round_robin:
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119
14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:1484816820000:0:foocare_zendesk_round_robin:'
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:1484816820000:0:foocare_zendesk_round_robin:'
which is terminating/terminated
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097568 107774 slave.cpp:3012] Handling status update TASK_LOST (UUID: b999fb64-34f0-496d-be19-f5a7f998230e)
for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
from @0.0.0.0:0
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097633 107774 slave.cpp:3975] Cleaning up executor 'ct:1484816820000:0:foocare_zendesk_round_robin:'
of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 at executor(1)@10.14.38.239:43937
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097790 107772 gc.cpp:55] Scheduling '/data/mesos/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:/runs/6b8922ff-3f57-42a0-97d1-d79c1de3d93b'
for gc 6.99999886874074days in the future
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097836 107772 gc.cpp:55] Scheduling '/data/mesos/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:'
for gc 6.99999886832296days in the future
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097869 107772 gc.cpp:55] Scheduling '/data/mesos/meta/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:/runs/6b8922ff-3f57-42a0-97d1-d79c1de3d93b'
for gc 6.99999886819259days in the future
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
14:42:17.097888 107772 gc.cpp:55] Scheduling '/data/mesos/meta/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:'
for gc 6.99999886809185days in the future
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.WARNING.20161004-154318.107733:W0119
14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:1484816820000:0:foocare_zendesk_round_robin:'
for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:1484816820000:0:foocare_zendesk_round_robin:'
which is terminating/terminated
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message