mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pankaj Saha <psa...@binghamton.edu>
Subject jobs are stuck in agents and staying in stagged state
Date Sun, 28 Aug 2016 21:26:04 GMT
Hi
I am facing an issue with a launched jobs into my mesos agents. I am trying
to launch a job through marathon framework and job is staying in stagged
state and not running.
I could see the log message at the agent console as below:

 Scheduling
'/var/lib/mesos-8082/meta/slaves/d6f0e3e2-d144-4275-9d38-82327408622b-S8/frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000'
for gc 6.99999884239407days in the future
I0828 16:20:36.053483 28512 slave.cpp:1361] *Got assigned task
test-crixus*.eb66a42b-6d5c-11e6-bec9-c27afc834a0c
for framework c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000
I0828 16:20:36.056224 28510 gc.cpp:83] Unscheduling
'/var/lib/mesos-8082/slaves/d6f0e3e2-d144-4275-9d38-82327408622b-S8/frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000'
from gc
I0828 16:20:36.056715 28510 gc.cpp:83] Unscheduling
'/var/lib/mesos-8082/meta/slaves/d6f0e3e2-d144-4275-9d38-82327408622b-S8/frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000'
from gc
I0828 16:20:36.057231 28509 slave.cpp:1480] *Launching task
test-crixus*.eb66a42b-6d5c-11e6-bec9-c27afc834a0c
for framework c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000
I0828 16:20:36.058661 28509 paths.cpp:528]* Trying to chown*
'/var/lib/mesos-8082/slaves/d6f0e3e2-d144-4275-9d38-82327408622b-S8/frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000/executors/test-crixus.eb66a42b-6d5c-11e6-bec9-c27afc834a0c/runs/99620406-87b5-406c-a88b-13adb145c12d'
to user 'root'
I0828 16:20:36.067807 28509 slave.cpp:5352]* Launching executor
test-crixus*.eb66a42b-6d5c-11e6-bec9-c27afc834a0c
of framework c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000 with resources
cpus(*):0.1; mem(*):32 in work directory
'/var/lib/mesos-8082/slaves/d6f0e3e2-d144-4275-9d38-82327408622b-S8/frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000/executors/test-crixus.eb66a42b-6d5c-11e6-bec9-c27afc834a0c/runs/99620406-87b5-406c-a88b-13adb145c12d'
I0828 16:20:36.069314 28509 slave.cpp:1698] *Queuing task
'test-crixus.*eb66a42b-6d5c-11e6-bec9-c27afc834a0c'
for executor 'test-crixus.eb66a42b-6d5c-11e6-bec9-c27afc834a0c' of
framework c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000
I0828 16:20:36.069902 28509 containerizer.cpp:666] *Starting container*
'99620406-87b5-406c-a88b-13adb145c12d' for executor
'test-crixus.eb66a42b-6d5c-11e6-bec9-c27afc834a0c' of framework
'c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000'
I0828 16:20:36.080713 28509 linux_launcher.cpp:304] *Cloning child process*
with flags =
I0828 16:20:36.084738 28509 containerizer.cpp:1179] *Checkpointing
executor's forked pid 29629* to
'/var/lib/mesos-8082/meta/slaves/d6f0e3e2-d144-4275-9d38-82327408622b-S8/frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000/executors/test-crixus.eb66a42b-6d5c-11e6-bec9-c27afc834a0c/runs/99620406-87b5-406c-a88b-13adb145c12d/pids/forked.pid'


But after that, the job is getting restarted and a new container is created
with a new process id. It happening infinitely which is keeping the job in
stagged state to mesos-master.

This job is nothing but a simle echo "hello world" kind of shell command.
Can anyone please point out where its failing or I am doing wrong.



Thanks
Pankaj

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message