mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ajay V <ajayv...@gmail.com>
Subject Mesos rare TASK_LOST scenario v 0.21.0
Date Wed, 10 Jan 2018 01:33:14 GMT
Hello,

I'm trying to debug a TASK_LOST thats generated on the agent that I see on
rare occasions.

Following is a log that I'm trying to understand. This is happening after
the driver.sendStatusUpdate() has been called with a task state of
TASK_FINISHED from a java executor. It looks to me like the container is
already exited before the TASK_FINISHED  is processed. Is there a timing
issue here in this version of mesos that is causing this? The effect of
this problem is that, even though the work of the executor is complete and
the executor calls the sendStatusUpdate with a TASK_FINISHED, the task is
marked as LOST and the actual update of TASK_FINISHED is ignored.

I0108 10:16:51.388300 37272 containerizer.cpp:1117] Executor for container
'bb0e5f2d-4bdb-479c-b829-4741993c4109' has exited

I0108 10:16:51.388741 37272 containerizer.cpp:946] Destroying container
'bb0e5f2d-4bdb-479c-b829-4741993c4109'

W0108 10:16:52.159241 37260 posix.hpp:192] No resource usage for unknown
container 'bb0e5f2d-4bdb-479c-b829-4741993c4109'

W0108 10:16:52.803463 37255 containerizer.cpp:888] Skipping resource
statistic for container bb0e5f2d-4bdb-479c-b829-4741993c4109 because:
Failed to get usage: No process found at 28952

I0108 10:16:52.899657 37278 slave.cpp:2898] Executor
'ff631ad1-cfab-493e-be18-961581abcf3d' of framework
20171208-050805-140555025-5050-3470-0000 exited with status 0

I0108 10:16:52.901736 37278 slave.cpp:2215] Handling status update
TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
ff631ad1-cfab-493e-be18-961581abcf3d of framework
20171208-050805-140555025-5050-3470-0000 from @0.0.0.0:0

I0108 10:16:52.901978 37278 slave.cpp:4305] Terminating task
ff631ad1-cfab-493e-be18-961581abcf3d

W0108 10:16:52.902793 37274 containerizer.cpp:852] Ignoring update for
unknown container: bb0e5f2d-4bdb-479c-b829-4741993c4109

I0108 10:16:52.903230 37274 status_update_manager.cpp:317] Received status
update TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
ff631ad1-cfab-493e-be18-961581abcf3d of framework
20171208-050805-140555025-5050-3470-0000

I0108 10:16:52.904119 37274 status_update_manager.cpp:371] Forwarding
update TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
ff631ad1-cfab-493e-be18-961581abcf3d of framework
20171208-050805-140555025-5050-3470-0000 to the slave

I0108 10:16:52.905725 37282 slave.cpp:2458] Forwarding the update TASK_LOST
(UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
ff631ad1-cfab-493e-be18-961581abcf3d of framework
20171208-050805-140555025-5050-3470-0000 to master@17.179.96.8:5050

I0108 10:16:52.906025 37282 slave.cpp:2385] Status update manager
successfully handled status update TASK_LOST (UUID:
f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
ff631ad1-cfab-493e-be18-961581abcf3d of framework
20171208-050805-140555025-5050-3470-0000

I0108 10:16:52.956588 37280 status_update_manager.cpp:389] Received status
update acknowledgement (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for
task ff631ad1-cfab-493e-be18-961581abcf3d of framework
20171208-050805-140555025-5050-3470-0000

I0108 10:16:52.956841 37280 status_update_manager.cpp:525] Cleaning up
status update stream for task ff631ad1-cfab-493e-be18-961581abcf3d of
framework 20171208-050805-140555025-5050-3470-0000

I0108 10:16:52.957608 37268 slave.cpp:1800] Status update manager
successfully handled status update acknowledgement (UUID:
f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
ff631ad1-cfab-493e-be18-961581abcf3d of framework
20171208-050805-140555025-5050-3470-0000

I0108 10:16:52.958693 37268 slave.cpp:4344] Completing task
ff631ad1-cfab-493e-be18-961581abcf3d

I0108 10:16:52.960364 37268 slave.cpp:3007] Cleaning up executor
'ff631ad1-cfab-493e-be18-961581abcf3d' of framework
20171208-050805-140555025-5050-3470-0000

Regards,
Ajay

Mime
View raw message