mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avinash Sridharan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-5879) cgroups/net_cls isolator causing agent recovery issues
Date Thu, 21 Jul 2016 20:46:20 GMT

    [ https://issues.apache.org/jira/browse/MESOS-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388397#comment-15388397
] 

Avinash Sridharan commented on MESOS-5879:
------------------------------------------

Could you clarify if the custom isolator you are testing also trying to manipulate the net_cls
handles? Or for that matter some other entity in the environment? If that is the case that
is a bigger problem. I am not comfortable with the fact that given that there is a misallocation
of handles we are allowing the isolator to proceed. This can have unexpected consequences,
which is why the isolator is reporting an error and bailing out rather than trying to live
with the problem.

> cgroups/net_cls isolator causing agent recovery issues
> ------------------------------------------------------
>
>                 Key: MESOS-5879
>                 URL: https://issues.apache.org/jira/browse/MESOS-5879
>             Project: Mesos
>          Issue Type: Bug
>          Components: cgroups, isolation, slave
>            Reporter: Silas Snider
>            Assignee: Avinash Sridharan
>
> We run with 'cgroups/net_cls' in our isolator list, and when we restart any agent process
in a cluster running an experimental custom isolator as well, the agents are unable to recover
from checkpoint, because net_cls reports that unknown orphan containers have duplicate net_cls
handles.
> While this is a problem that needs to be solved (probably by fixing our custom isolator),
it's also a problem that the net_cls isolator fails recovery just for duplicate handles in
cgroups that it is literally about to unconditionally destroy during recovery. Can this be
fixed?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message