mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avinash Sridharan (JIRA)" <>
Subject [jira] [Commented] (MESOS-5879) cgroups/net_cls isolator causing agent recovery issues
Date Thu, 21 Jul 2016 18:25:20 GMT


Avinash Sridharan commented on MESOS-5879:

Are you seeing the following error:

        "The secondary handle " + hexify(handle.secondary) +
        ", for the primary handle " + hexify(handle.primary) +
        " has already been allocated");

This should never happen, unless your custom isolator was going and modifying the net_cls
handles as well ? If no other actor is modifying these handles except the `NetClsHandleManager`
then we shouldn't see this problem, and there might be a different bug here, then just blindly
requiring to destroy the orphan container.

> cgroups/net_cls isolator causing agent recovery issues
> ------------------------------------------------------
>                 Key: MESOS-5879
>                 URL:
>             Project: Mesos
>          Issue Type: Bug
>          Components: cgroups, isolation, slave
>            Reporter: Silas Snider
>            Assignee: Avinash Sridharan
> We run with 'cgroups/net_cls' in our isolator list, and when we restart any agent process
in a cluster running an experimental custom isolator as well, the agents are unable to recover
from checkpoint, because net_cls reports that unknown orphan containers have duplicate net_cls
> While this is a problem that needs to be solved (probably by fixing our custom isolator),
it's also a problem that the net_cls isolator fails recovery just for duplicate handles in
cgroups that it is literally about to unconditionally destroy during recovery. Can this be

This message was sent by Atlassian JIRA

View raw message