mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Garcia (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MESOS-5836) Cgroup Leakage in 4.2, 4.4, 4.5 kernels
Date Tue, 12 Jul 2016 19:09:20 GMT

     [ https://issues.apache.org/jira/browse/MESOS-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

John Garcia updated MESOS-5836:
-------------------------------
    Description: 
We've noticed an issue with kernel versions 4.2, 4.4, and 4.5 where memory cgroups are not
cleaned up by the system. When the register fills up with 65336 cgroups, additional cgroups
cannot be formed because there's no IDs for the new cgroup, and ENOSPC is returned. This is
a concern for the Mesos project because no further containers can be created by Mesos in this
state. We tested Docker 1.8.3, and Docker 1.8.3 will silently fail to build the memory cgroup,
resulting in rogue containers that are memory-unbound.

h3. Steps to reproduce:
*NOTE: Mesos is not required to reproduce this issue*

- Start a new instance using kernel 4.2, 4.4, or 4.5 (CoreOS 766-1010, Ubuntu 16.04) 
- ssh to the machine
- {{cat /proc/cgroups}} to determine the number of memory cgroups
- Run several docker containers using the {{--memory}} or {{-m}} option to set a memory isolator,
either in parallel or in series
- Stop all containers
- {{cat /proc/cgroups}} to review the number of memory cgroups and compare to previous run
- Optional: Run 65,336 docker containers using memory isolation and then try to launch a Mesos
container

h3. Differential diagnosis:

When the cgroup limit is exceeded, subsequent container terminations will draw the following
error in {{dmesg}}:
{code}idr_remove called for id=65536 which is not allocated.{code}
Subsequent efforts to create a cgroup folder will fail:
{code}/sys/fs/cgroup/memory/mesos $ df .
Filesystem     1K-blocks  Used Available Use% Mounted on
cgroup                 0     0         0    - /sys/fs/cgroup/memory
/sys/fs/cgroup/memory/mesos $ sudo mkdir foo
mkdir: cannot create directory 'foo': No space left on device{code}
Subsequently launched Docker containers will fail to utilize memory isolation: {code}/sys/fs/cgroup/memory/mesos
$ docker run -m 32m -d example/busybox sleep 10000

...

/sys/fs/cgroup/memory/mesos $ docker ps | grep busybox
849c66081229        example/busybox                                                      
  "sleep 10000"            6 seconds ago       Up 4 seconds                              
                                                     suspicious_mahavira

/sys/fs/cgroup/memory/mesos $ find /sys/fs/cgroup -name "*849c66081229*"
/sys/fs/cgroup/blkio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/freezer/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/devices/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/cpu,cpuacct/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/cpuset/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/net_cls,net_prio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/systemd/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/memory/mesos $ {code}
Mesos containerizer will fail with {{No space left on device}}:
{code}E0707 20:17:29.091142 105665 slave.cpp:3802] Container 'ef5419cf-9d00-425a-a9ee-a848d330bfb2'
for executor 'node-0_executor__42a4fafe-f64d-4b41-91d2-efc20a86a6a3' of framework d6ab251a-064a-46a0-a1c8-9ee559f3b44a-0023
failed to start: Failed to prepare isolator: Failed to create directory '/sys/fs/cgroup/memory/mesos/ef5419cf-9d00-425a-a9ee-a848d330bfb2':
No space left on device
{code}

h3. Remediation

Once a system is found to be affected, the following command can be used to drop all page
caches, which allows the system to reap all of the old cgroups and return to normal operation.
{code}echo 1 > /proc/sys/vm/drop_caches{code}

We suspect that [patch 9184539|https://patchwork.kernel.org/patch/9184539/] could fix it,
but we have not yet tested.

  was:
We've noticed an issue with kernel versions 4.2, 4.4, and 4.5 where memory cgroups are not
cleaned up by the system. When the register fills up with 65336 cgroups, additional cgroups
cannot be formed because there's no IDs for the new cgroup, and ENOSPC is returned. This is
a concern for the Mesos project because no further containers can be created by Mesos in this
state. We tested Docker 1.8.3, and Docker 1.8.3 will silently fail to build the memory cgroup,
resulting in rogue containers that are memory-unbound.

h3. Steps to reproduce:
*NOTE: Mesos is not required to reproduce this issue*

- Start a new instance using kernel 4.2, 4.4, or 4.5 (CoreOS 766-1010, Ubuntu 16.04) 
- ssh to the machine
- {{cat /proc/cgroups}} to determine the number of memory cgroups
- Run several docker containers using the {{--memory}} or {{-m}} option to set a memory isolator,
either in parallel or in series
- Stop all containers
- {{cat /proc/cgroups}} to review the number of memory cgroups and compare to previous run
- Optional: Run 65,336 docker containers using memory isolation and then try to launch a Mesos
container

h3. Differential diagnosis:

When the cgroup limit is exceeded, subsequent container terminations will draw the following
error in {{dmesg}}:
{code}idr_remove called for id=65536 which is not allocated.{code}
Subsequent efforts to create a cgroup folder will fail:
{code}/sys/fs/cgroup/memory/mesos $ df .
Filesystem     1K-blocks  Used Available Use% Mounted on
cgroup                 0     0         0    - /sys/fs/cgroup/memory
/sys/fs/cgroup/memory/mesos $ sudo mkdir foo
mkdir: cannot create directory 'foo': No space left on device{code}
Subsequently launched Docker containers will fail to utilize memory isolation: {code}/sys/fs/cgroup/memory/mesos
$ docker run -m 32m -d 10.1.13.1:9000/montana/busybox sleep 10000

...

/sys/fs/cgroup/memory/mesos $ docker ps | grep busybox
849c66081229        example/busybox                                                      
  "sleep 10000"            6 seconds ago       Up 4 seconds                              
                                                     suspicious_mahavira

/sys/fs/cgroup/memory/mesos $ find /sys/fs/cgroup -name "*849c66081229*"
/sys/fs/cgroup/blkio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/freezer/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/devices/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/cpu,cpuacct/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/cpuset/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/net_cls,net_prio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/systemd/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/memory/mesos $ {code}
Mesos containerizer will fail with {{No space left on device}}:
{code}E0707 20:17:29.091142 105665 slave.cpp:3802] Container 'ef5419cf-9d00-425a-a9ee-a848d330bfb2'
for executor 'node-0_executor__42a4fafe-f64d-4b41-91d2-efc20a86a6a3' of framework d6ab251a-064a-46a0-a1c8-9ee559f3b44a-0023
failed to start: Failed to prepare isolator: Failed to create directory '/sys/fs/cgroup/memory/mesos/ef5419cf-9d00-425a-a9ee-a848d330bfb2':
No space left on device
{code}

h3. Remediation

Once a system is found to be affected, the following command can be used to drop all page
caches, which allows the system to reap all of the old cgroups and return to normal operation.
{code}echo 1 > /proc/sys/vm/drop_caches{code}

We suspect that [patch 9184539|https://patchwork.kernel.org/patch/9184539/] could fix it,
but we have not yet tested.


> Cgroup Leakage in 4.2, 4.4, 4.5 kernels
> ---------------------------------------
>
>                 Key: MESOS-5836
>                 URL: https://issues.apache.org/jira/browse/MESOS-5836
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>    Affects Versions: 0.28.1, 0.28.2, 1.0.0, 1.1.0
>            Reporter: John Garcia
>              Labels: mesosphere
>
> We've noticed an issue with kernel versions 4.2, 4.4, and 4.5 where memory cgroups are
not cleaned up by the system. When the register fills up with 65336 cgroups, additional cgroups
cannot be formed because there's no IDs for the new cgroup, and ENOSPC is returned. This is
a concern for the Mesos project because no further containers can be created by Mesos in this
state. We tested Docker 1.8.3, and Docker 1.8.3 will silently fail to build the memory cgroup,
resulting in rogue containers that are memory-unbound.
> h3. Steps to reproduce:
> *NOTE: Mesos is not required to reproduce this issue*
> - Start a new instance using kernel 4.2, 4.4, or 4.5 (CoreOS 766-1010, Ubuntu 16.04)

> - ssh to the machine
> - {{cat /proc/cgroups}} to determine the number of memory cgroups
> - Run several docker containers using the {{--memory}} or {{-m}} option to set a memory
isolator, either in parallel or in series
> - Stop all containers
> - {{cat /proc/cgroups}} to review the number of memory cgroups and compare to previous
run
> - Optional: Run 65,336 docker containers using memory isolation and then try to launch
a Mesos container
> h3. Differential diagnosis:
> When the cgroup limit is exceeded, subsequent container terminations will draw the following
error in {{dmesg}}:
> {code}idr_remove called for id=65536 which is not allocated.{code}
> Subsequent efforts to create a cgroup folder will fail:
> {code}/sys/fs/cgroup/memory/mesos $ df .
> Filesystem     1K-blocks  Used Available Use% Mounted on
> cgroup                 0     0         0    - /sys/fs/cgroup/memory
> /sys/fs/cgroup/memory/mesos $ sudo mkdir foo
> mkdir: cannot create directory 'foo': No space left on device{code}
> Subsequently launched Docker containers will fail to utilize memory isolation: {code}/sys/fs/cgroup/memory/mesos
$ docker run -m 32m -d example/busybox sleep 10000
> ...
> /sys/fs/cgroup/memory/mesos $ docker ps | grep busybox
> 849c66081229        example/busybox                                                 
       "sleep 10000"            6 seconds ago       Up 4 seconds                         
                                                          suspicious_mahavira
> /sys/fs/cgroup/memory/mesos $ find /sys/fs/cgroup -name "*849c66081229*"
> /sys/fs/cgroup/blkio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
> /sys/fs/cgroup/freezer/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
> /sys/fs/cgroup/devices/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
> /sys/fs/cgroup/cpu,cpuacct/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
> /sys/fs/cgroup/cpuset/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
> /sys/fs/cgroup/net_cls,net_prio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
> /sys/fs/cgroup/systemd/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
> /sys/fs/cgroup/memory/mesos $ {code}
> Mesos containerizer will fail with {{No space left on device}}:
> {code}E0707 20:17:29.091142 105665 slave.cpp:3802] Container 'ef5419cf-9d00-425a-a9ee-a848d330bfb2'
for executor 'node-0_executor__42a4fafe-f64d-4b41-91d2-efc20a86a6a3' of framework d6ab251a-064a-46a0-a1c8-9ee559f3b44a-0023
failed to start: Failed to prepare isolator: Failed to create directory '/sys/fs/cgroup/memory/mesos/ef5419cf-9d00-425a-a9ee-a848d330bfb2':
No space left on device
> {code}
> h3. Remediation
> Once a system is found to be affected, the following command can be used to drop all
page caches, which allows the system to reap all of the old cgroups and return to normal operation.
> {code}echo 1 > /proc/sys/vm/drop_caches{code}
> We suspect that [patch 9184539|https://patchwork.kernel.org/patch/9184539/] could fix
it, but we have not yet tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message