aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Sirois (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AURORA-1662) No Memory and CPU Enforcement
Date Tue, 12 Apr 2016 22:26:25 GMT

    [ https://issues.apache.org/jira/browse/AURORA-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238115#comment-15238115
] 

John Sirois commented on AURORA-1662:
-------------------------------------

See interesting relevant mesos-slave (0.26.0) flags below.  You want cgroups {{--isolation}}
if your kernels support cgroups.

{noformat}
Usage: mesos-slave [options]

...
  --[no-]cgroups_cpu_enable_pids_and_tids_count     Cgroups feature flag to enable counting
of processes and threads
                                                    inside a container.
                                                    (default: false)
  --[no-]cgroups_enable_cfs                         Cgroups feature flag to enable hard limits
on CPU resources
                                                    via the CFS bandwidth limiting subfeature.
                                                    (default: false)
  --cgroups_hierarchy=VALUE                         The path to the cgroups hierarchy root
                                                    (default: /sys/fs/cgroup)
  --[no-]cgroups_limit_swap                         Cgroups feature flag to enable memory
limits on both memory and
                                                    swap instead of just memory.
                                                    (default: false)
  --cgroups_root=VALUE                              Name of the root cgroup
                                                    (default: mesos)
                                                    (default: mesos)
  --container_disk_watch_interval=VALUE             The interval between disk quota checks
for containers. This flag is
                                                    used for the 'posix/disk' isolator. (default:
15secs)
  --containerizer_path=VALUE                        The path to the external containerizer
executable used when
                                                    external isolation is activated (--isolation=external).
  --containerizers=VALUE                            Comma-separated list of containerizer
implementations
                                                    to compose in order to provide containerization.
                                                    Available options are 'mesos', 'external',
and
                                                    'docker' (on Linux). The order the containerizers
                                                    are specified is the order they are tried
                                                    (--containerizers=mesos).
                                                    (default: mesos)
...
  --isolation=VALUE                                 Isolation mechanisms to use, e.g., 'posix/cpu,posix/mem',
or
                                                    'cgroups/cpu,cgroups/mem', or network/port_mapping
                                                    (configure with flag: --with-network-isolator
to enable),
                                                    or 'external', or load an alternate isolator
module using
                                                    the --modules flag. Note that this flag
is only relevant
                                                    for the Mesos Containerizer. (default:
posix/cpu,posix/mem)
...
{noformat}


> No Memory and CPU Enforcement
> -----------------------------
>
>                 Key: AURORA-1662
>                 URL: https://issues.apache.org/jira/browse/AURORA-1662
>             Project: Aurora
>          Issue Type: Bug
>          Components: Executor
>    Affects Versions: 0.11.0
>            Reporter: zane silver
>
> I'm running a job that is consuming more memory (ram) than I've been allocated. The Mesos
and Aurora UIs properly display the memory utilization vs the allocated/requested amount.
However, the executor is not stopped once the job extends beyond it's limit. There appears
to be no enforcement.
> Looking at the source, it also seems that there is only enforcement on the disk usage.
I see in (src/main/python/apache/aurora/executor/common/resource_manager.py) the ResourceManager
status() method, that only disk is explicitly checked.
> I feel like I must be missing something and that the enforcement for cpu and memory is
actually elsewhere. If not, this is an easy fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message