mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From haosdent <haosd...@gmail.com>
Subject Re: jobs are stuck in agents and staying in stagged state
Date Mon, 29 Aug 2016 11:25:17 GMT
Hi, @Pankaj, Could you provide logs during " the job is getting restarted
and a new container is created with a new process id. ". The logs you
provided looks normal.

On Mon, Aug 29, 2016 at 5:26 AM, Pankaj Saha <psaha4@binghamton.edu> wrote:

> Hi
> I am facing an issue with a launched jobs into my mesos agents. I am trying
> to launch a job through marathon framework and job is staying in stagged
> state and not running.
> I could see the log message at the agent console as below:
>
>  Scheduling
> '/var/lib/mesos-8082/meta/slaves/d6f0e3e2-d144-4275-9d38-82327408622b-S8/
> frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000'
> for gc 6.99999884239407days in the future
> I0828 16:20:36.053483 28512 slave.cpp:1361] *Got assigned task
> test-crixus*.eb66a42b-6d5c-11e6-bec9-c27afc834a0c
> for framework c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000
> I0828 16:20:36.056224 28510 gc.cpp:83] Unscheduling
> '/var/lib/mesos-8082/slaves/d6f0e3e2-d144-4275-9d38-
> 82327408622b-S8/frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000'
> from gc
> I0828 16:20:36.056715 28510 gc.cpp:83] Unscheduling
> '/var/lib/mesos-8082/meta/slaves/d6f0e3e2-d144-4275-9d38-82327408622b-S8/
> frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000'
> from gc
> I0828 16:20:36.057231 28509 slave.cpp:1480] *Launching task
> test-crixus*.eb66a42b-6d5c-11e6-bec9-c27afc834a0c
> for framework c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000
> I0828 16:20:36.058661 28509 paths.cpp:528]* Trying to chown*
> '/var/lib/mesos-8082/slaves/d6f0e3e2-d144-4275-9d38-
> 82327408622b-S8/frameworks/c796100f-9ecb-46fa-90a2-
> 72ad649c5dd3-0000/executors/test-crixus.eb66a42b-6d5c-
> 11e6-bec9-c27afc834a0c/runs/99620406-87b5-406c-a88b-13adb145c12d'
> to user 'root'
> I0828 16:20:36.067807 28509 slave.cpp:5352]* Launching executor
> test-crixus*.eb66a42b-6d5c-11e6-bec9-c27afc834a0c
> of framework c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000 with resources
> cpus(*):0.1; mem(*):32 in work directory
> '/var/lib/mesos-8082/slaves/d6f0e3e2-d144-4275-9d38-
> 82327408622b-S8/frameworks/c796100f-9ecb-46fa-90a2-
> 72ad649c5dd3-0000/executors/test-crixus.eb66a42b-6d5c-
> 11e6-bec9-c27afc834a0c/runs/99620406-87b5-406c-a88b-13adb145c12d'
> I0828 16:20:36.069314 28509 slave.cpp:1698] *Queuing task
> 'test-crixus.*eb66a42b-6d5c-11e6-bec9-c27afc834a0c'
> for executor 'test-crixus.eb66a42b-6d5c-11e6-bec9-c27afc834a0c' of
> framework c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000
> I0828 16:20:36.069902 28509 containerizer.cpp:666] *Starting container*
> '99620406-87b5-406c-a88b-13adb145c12d' for executor
> 'test-crixus.eb66a42b-6d5c-11e6-bec9-c27afc834a0c' of framework
> 'c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000'
> I0828 16:20:36.080713 28509 linux_launcher.cpp:304] *Cloning child process*
> with flags =
> I0828 16:20:36.084738 28509 containerizer.cpp:1179] *Checkpointing
> executor's forked pid 29629* to
> '/var/lib/mesos-8082/meta/slaves/d6f0e3e2-d144-4275-9d38-82327408622b-S8/
> frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000/
> executors/test-crixus.eb66a42b-6d5c-11e6-bec9-c27afc834a0c/runs/99620406-
> 87b5-406c-a88b-13adb145c12d/pids/forked.pid'
>
>
> But after that, the job is getting restarted and a new container is created
> with a new process id. It happening infinitely which is keeping the job in
> stagged state to mesos-master.
>
> This job is nothing but a simle echo "hello world" kind of shell command.
> Can anyone please point out where its failing or I am doing wrong.
>
>
>
> Thanks
> Pankaj
>



-- 
Best Regards,
Haosdent Huang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message