mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James DeFelice (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-4565) slave recovers and attempt to destroy executor's child containers, then begins rejecting task status updates
Date Thu, 04 Feb 2016 05:56:39 GMT

    [ https://issues.apache.org/jira/browse/MESOS-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15131778#comment-15131778
] 

James DeFelice commented on MESOS-4565:
---------------------------------------

To be clear the custom executor in this case is using the native
containerizer, not the docker one.



> slave recovers and attempt to destroy executor's child containers, then begins rejecting
task status updates
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-4565
>                 URL: https://issues.apache.org/jira/browse/MESOS-4565
>             Project: Mesos
>          Issue Type: Bug
>          Components: docker
>    Affects Versions: 0.26.0
>            Reporter: James DeFelice
>              Labels: mesosphere
>
> AFAICT the slave is doing this:
> 1) recovering from some kind of failure
> 2) checking the containers that it pulled from its state store
> 3) complaining about cgroup children hanging off of executor containers
> 4) rejecting task status updates related to the executor container, the first of which
in the logs is:
> {code}
> E0130 02:22:21.979852 12683 slave.cpp:2963] Failed to update resources for container
1d965a20-849c-40d8-9446-27cb723220a9 of executor 'd701ab48a0c0f13_k8sm-executor' running task
pod.f2dc2c43-c6f7-11e5-ad28-0ad18c5e6c7f on status update for terminal task, destroying container:
Container '1d965a20-849c-40d8-9446-27cb723220a9' not found
> {code}
> To be fair, I don't believe that my custom executor is re-registering properly with the
slave prior to attempting to send these (failing) status updates. But the slave doesn't complain
about that .. it complains that it can't find the **container**.
> slave log here:
> https://gist.github.com/jdef/265663461156b7a7ed4e



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message