incubator-mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Charles Reiss (Created) (JIRA)" <>
Subject [jira] [Created] (MESOS-47) Kill entire containers on OOM with LXC isolation module
Date Wed, 26 Oct 2011 20:47:32 GMT
Kill entire containers on OOM with LXC isolation module

                 Key: MESOS-47
             Project: Mesos
          Issue Type: Improvement
          Components: isolation
         Environment: Linux with container-based isolation
            Reporter: Charles Reiss

When using the LXC isolation module, the kernel OOM killer will kill a victim process in the
container when the container exceeds its memory limit. When the container contains multiple
processes this can cause weird failures.

Instead, Mesos should use the memory cgroup's oom_control feature to disable OOM kills (instead,
processes requesting memory block) and have the slave be informed of OOM events using an eventfd.
When the slave receives OOM messages on this event fd, it should kill all processes in the
over-limit executor's container.

(These OOM events only happen when a container exceeds its hard memory limit. If Mesos does
overcommit of memory in the future, then it should have a outer cgroup with memory hard limits
and memory.use_hierarchy enabled on which to get OOM events (so they don't turn into global
OOM kills). Mesos will need to have code to figure out which executors are exceeding their
"soft" memory limits and choose a victim executor.)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message