mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Galkin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MESOS-4999) Mesos (or Marathon) lost tasks
Date Tue, 22 Mar 2016 12:24:25 GMT

     [ https://issues.apache.org/jira/browse/MESOS-4999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey Galkin updated MESOS-4999:
---------------------------------
    Description: 
After a lot of create/delete application  with docker instances  through Marathon API I have
a lot of lost tasks after last *deleting all application in Marathon*.
They are divided into three types
1. Tasks hangs in STAGED status. I don't see this tasks in 'docker ps' on the slave and _service
docker restart_ on mesos slave did not fix these tasks.
2. RUNNING because docker hangs and can't delete these instances  (a lot of 
{code}
Killing docker task
Shutting down
Killing docker task
Shutting down
{code}
 in stdout,  
_docker stop ID_ hangs and these tasks can be fixed by _service docker restart_ on mesos slave.
3. RUNNING after _service docker restart_ on mesos slave.

Screenshot attached 

  was:
After a lot of create/delete application  with docker instances  through Marathon API I have
a lot of lost nodes after last *deleting all application in Marathon*.
They are divided into three types
1. Tasks hangs in STAGED status. I don't see this tasks in 'docker ps' on the slave and _service
docker restart_ on mesos slave did not fix these tasks.
2. RUNNING because docker hangs and can't delete these instances  (a lot of 
{code}
Killing docker task
Shutting down
Killing docker task
Shutting down
{code}
 in stdout,  
_docker stop ID_ hangs and these tasks can be fixed by _service docker restart_ on mesos slave.
3. RUNNING after _service docker restart_ on mesos slave.

Screenshot attached 


> Mesos (or Marathon) lost tasks
> ------------------------------
>
>                 Key: MESOS-4999
>                 URL: https://issues.apache.org/jira/browse/MESOS-4999
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.27.2
>         Environment: mesos - 0.27.0
> marathon - 0.15.2
> 189 mesos slaves with Ubuntu 14.04.2 on HP ProLiant DL380 Gen9,
> CPU - 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @2.50GHz (48 cores (with hyperthreading))
> RAM - 264G,
> Storage - 3.0T on RAID on HP Smart Array P840 Controller,
> HDD - 12 x HP EH0600JDYTL
> Network - 2 x Intel Corporation Ethernet 10G 2P X710,
>            Reporter: Sergey Galkin
>         Attachments: mesos-nodes.png
>
>
> After a lot of create/delete application  with docker instances  through Marathon API
I have a lot of lost tasks after last *deleting all application in Marathon*.
> They are divided into three types
> 1. Tasks hangs in STAGED status. I don't see this tasks in 'docker ps' on the slave and
_service docker restart_ on mesos slave did not fix these tasks.
> 2. RUNNING because docker hangs and can't delete these instances  (a lot of 
> {code}
> Killing docker task
> Shutting down
> Killing docker task
> Shutting down
> {code}
>  in stdout,  
> _docker stop ID_ hangs and these tasks can be fixed by _service docker restart_ on mesos
slave.
> 3. RUNNING after _service docker restart_ on mesos slave.
> Screenshot attached 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message