aurora-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From s...@apache.org
Subject aurora git commit: Introduce a flag to treat RAM as a revocable resources
Date Mon, 12 Sep 2016 22:09:58 GMT
Repository: aurora
Updated Branches:
  refs/heads/master b429612ef -> c4903d873


Introduce a flag to treat RAM as a revocable resources

We plan to open source a very simple Mesos ResourceEstimator and QosController that supports
RAM and CPU oversubscription (ETA ~2 weeks). We have been using it internally with a patched
Aurora version where the hardcoded `isMesosRevocable` flag of RAM has been set to `true`.
This patch makes this behaviour configurable.

Reviewed at https://reviews.apache.org/r/51807/


Project: http://git-wip-us.apache.org/repos/asf/aurora/repo
Commit: http://git-wip-us.apache.org/repos/asf/aurora/commit/c4903d87
Tree: http://git-wip-us.apache.org/repos/asf/aurora/tree/c4903d87
Diff: http://git-wip-us.apache.org/repos/asf/aurora/diff/c4903d87

Branch: refs/heads/master
Commit: c4903d873d090549ebdf9a07110851b5aad7d978
Parents: b429612
Author: Stephan Erb <serb@apache.org>
Authored: Tue Sep 13 00:09:29 2016 +0200
Committer: Stephan Erb <serb@apache.org>
Committed: Tue Sep 13 00:09:29 2016 +0200

----------------------------------------------------------------------
 RELEASE-NOTES.md                                |  2 ++
 docs/features/resource-isolation.md             |  6 ++--
 docs/operations/configuration.md                |  5 +++
 docs/reference/scheduler-configuration.md       | 24 ++++++++++---
 .../scheduler/resources/ResourceSettings.java   | 37 ++++++++++++++++++++
 .../scheduler/resources/ResourceType.java       |  6 ++--
 6 files changed, 70 insertions(+), 10 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/aurora/blob/c4903d87/RELEASE-NOTES.md
----------------------------------------------------------------------
diff --git a/RELEASE-NOTES.md b/RELEASE-NOTES.md
index bbf7198..4476d52 100644
--- a/RELEASE-NOTES.md
+++ b/RELEASE-NOTES.md
@@ -35,6 +35,8 @@
   schedulers up. A rolling upgrade would result in no leading scheduler for the duration
of the
   roll which could be confusing to monitor and debug.
 - Add a new MTTS (Median Time To Starting) metric in addition to MTTA and MTTR.
+- In addition to CPU resources, RAM resources can now be treated as revocable via the scheduler
+  commandline flag `-enable_revocable_ram`.
 
 ### Deprecations and removals:
 

http://git-wip-us.apache.org/repos/asf/aurora/blob/c4903d87/docs/features/resource-isolation.md
----------------------------------------------------------------------
diff --git a/docs/features/resource-isolation.md b/docs/features/resource-isolation.md
index 01c5b40..503f2de 100644
--- a/docs/features/resource-isolation.md
+++ b/docs/features/resource-isolation.md
@@ -168,9 +168,9 @@ via the concept of revocable tasks. In contrast to non-revocable tasks,
revocabl
 Mesos reserves the right to throttle or even kill them if they might affect existing high-priority
 user-facing services.
 
-As of today, the only revocable resource supported by Aurora are CPU resources. A job can
opt-in to
-use those by specifying the `revocable` [Configuration Tier](../features/multitenancy.md#configuration-tiers).
-A revocable job will only be scheduled using revocable CPU resources, even if there are plenty
of
+As of today, the only revocable resource supported by Aurora are CPU and RAM resources. A
job can
+opt-in to use those by specifying the `revocable` [Configuration Tier](../features/multitenancy.md#configuration-tiers).
+A revocable job will only be scheduled using revocable resources, even if there are plenty
of
 non-revocable resources available.
 
 The Aurora scheduler must be [configured to receive revocable offers](../operations/configuration.md#resource-isolation)

http://git-wip-us.apache.org/repos/asf/aurora/blob/c4903d87/docs/operations/configuration.md
----------------------------------------------------------------------
diff --git a/docs/operations/configuration.md b/docs/operations/configuration.md
index 90dde57..203f3be 100644
--- a/docs/operations/configuration.md
+++ b/docs/operations/configuration.md
@@ -126,6 +126,11 @@ and then set set this Aurora scheduler flag to allow receiving revocable
Mesos o
 
     -receive_revocable_resources=true
 
+Both CPUs and RAM are supported as revocable resources. The former is enabled by the default,
+the latter needs to be enabled via:
+
+    -enable_revocable_ram=true
+
 Unless you want to use the [default](../../src/main/resources/org/apache/aurora/scheduler/tiers.json)
 tier configuration, you will also have to specify a file path:
 

http://git-wip-us.apache.org/repos/asf/aurora/blob/c4903d87/docs/reference/scheduler-configuration.md
----------------------------------------------------------------------
diff --git a/docs/reference/scheduler-configuration.md b/docs/reference/scheduler-configuration.md
index 87d2cde..31be714 100644
--- a/docs/reference/scheduler-configuration.md
+++ b/docs/reference/scheduler-configuration.md
@@ -22,6 +22,8 @@ Required flags:
 	Max number of idle connections to the database via MyBatis
 -framework_authentication_file
 	Properties file which contains framework credentials to authenticate with Mesosmaster. Must
contain the properties 'aurora_authentication_principal' and 'aurora_authentication_secret'.
+-ip
+	The ip address to listen. If not set, the scheduler will listen on all interfaces.
 -mesos_master_address [not null]
 	Address for the mesos master, can be a socket address or zookeeper path.
 -mesos_role
@@ -34,12 +36,16 @@ Required flags:
 	Path to the thermos executor entry point.
 -tier_config [file must be readable]
 	Configuration file defining supported task tiers, task traits and behaviors.
+-webhook_config [file must exist, file must be readable]
+	Path to webhook configuration file.
 -zk_endpoints [must have at least 1 item]
 	Endpoint specification for the ZooKeeper servers.
 
 Optional flags:
 -allow_docker_parameters (default false)
 	Allow to pass docker container parameters in the job.
+-allow_gpu_resource (default false)
+	Allow jobs to request Mesos GPU resource.
 -allowed_container_types (default [MESOS])
 	Container types that are allowed to be used by jobs.
 -async_slot_stat_update_interval (default (1, mins))
@@ -76,10 +82,16 @@ Optional flags:
 	List of domains for which CORS support should be enabled.
 -enable_h2_console (default false)
 	Enable H2 DB management console.
+-enable_mesos_fetcher (default false)
+	Allow jobs to pass URIs to the Mesos Fetcher. Note that enabling this feature could pose
a privilege escalation threat.
 -enable_preemptor (default true)
 	Enable the preemptor and preemption
+-enable_revocable_cpus (default true)
+	Treat CPUs as a revocable resource.
+-enable_revocable_ram (default false)
+	Treat RAM as a revocable resource.
 -executor_user (default root)
-	User to start the executor. Defaults to "root". Set this to an unprivileged user if the
mesos master was started with "--no-root_submissions". If set to anything other than "root",
the executor will ignore the "role" setting for jobs since it can't use setuid() anymore.
This means that all your jobs will run under the specified user and the user has to exist
on the mesos slaves.
+	User to start the executor. Defaults to "root". Set this to an unprivileged user if the
mesos master was started with "--no-root_submissions". If set to anything other than "root",
the executor will ignore the "role" setting for jobs since it can't use setuid() anymore.
This means that all your jobs will run under the specified user and the user has to exist
on the Mesos agents.
 -first_schedule_delay (default (1, ms))
 	Initial amount of time to wait before first attempting to schedule a PENDING task.
 -flapping_task_threshold (default (5, mins))
@@ -163,7 +175,7 @@ Optional flags:
 -offer_hold_jitter_window (default (1, mins))
 	Maximum amount of random jitter to add to the offer hold time window.
 -offer_reservation_duration (default (3, mins))
-	Time to reserve a slave's offers while trying to satisfy a task preempting another.
+	Time to reserve a agent's offers while trying to satisfy a task preempting another.
 -populate_discovery_info (default false)
 	If true, Aurora populates DiscoveryInfo field of Mesos TaskInfo.
 -preemption_delay (default (3, mins))
@@ -174,6 +186,10 @@ Optional flags:
 	Time interval between pending task preemption slot searches.
 -receive_revocable_resources (default false)
 	Allows receiving revocable resource offers from Mesos.
+-reconciliation_explicit_batch_interval (default (5, secs))
+	Interval between explicit batch reconciliation requests.
+-reconciliation_explicit_batch_size (default 1000) [must be > 0]
+	Number of tasks in a single batch request sent to Mesos for explicit reconciliation.
 -reconciliation_explicit_interval (default (60, mins))
 	Interval on which scheduler will ask Mesos for status updates of all non-terminal tasks
known to scheduler.
 -reconciliation_implicit_interval (default (60, mins))
@@ -186,7 +202,7 @@ Optional flags:
 	If false, Docker tasks may run without an executor (EXPERIMENTAL)
 -shiro_ini_path
 	Path to shiro.ini for authentication and authorization configuration.
--shiro_realm_modules (default [org.apache.aurora.scheduler.app.MoreModules$1@13c9d689])
+-shiro_realm_modules (default [org.apache.aurora.scheduler.app.MoreModules$1@158a8276])
 	Guice modules for configuring Shiro Realms.
 -sla_non_prod_metrics (default [])
 	Metric categories collected for non production tasks.
@@ -218,8 +234,6 @@ Optional flags:
 	Whether to use the experimental database-backed task store.
 -viz_job_url_prefix (default )
 	URL prefix for job container stats.
--webhook_config [file must be readable]
-    File to configure a HTTP webhook to receive task state change events.
 -zk_chroot_path
 	chroot path to use for the ZooKeeper connections
 -zk_digest_credentials

http://git-wip-us.apache.org/repos/asf/aurora/blob/c4903d87/src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java b/src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java
new file mode 100644
index 0000000..c49fd06
--- /dev/null
+++ b/src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java
@@ -0,0 +1,37 @@
+/**
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.aurora.scheduler.resources;
+
+import org.apache.aurora.common.args.Arg;
+import org.apache.aurora.common.args.CmdLine;
+
+/**
+ * Control knobs for how Aurora treats different resource types.
+ *
+ * The command line handling seen here is non-standard. Normally we declare them in modules
+ * and then inject them via 'settings' classes. Unfortunately, this does not work here as
we
+ * would need to perform the injection into the ResourceType enum. Enums are picky in that
regard.
+ */
+final class ResourceSettings {
+
+  @CmdLine(name = "enable_revocable_cpus", help = "Treat CPUs as a revocable resource.")
+  static final Arg<Boolean> ENABLE_REVOCABLE_CPUS = Arg.create(true);
+
+  @CmdLine(name = "enable_revocable_ram", help = "Treat RAM as a revocable resource.")
+  static final Arg<Boolean> ENABLE_REVOCABLE_RAM = Arg.create(false);
+
+  private ResourceSettings() {
+
+  }
+}

http://git-wip-us.apache.org/repos/asf/aurora/blob/c4903d87/src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java b/src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java
index 4c102a3..e1a5dce 100644
--- a/src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java
+++ b/src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java
@@ -36,6 +36,8 @@ import static org.apache.aurora.scheduler.resources.AuroraResourceConverter.STRI
 import static org.apache.aurora.scheduler.resources.MesosResourceConverter.RANGES;
 import static org.apache.aurora.scheduler.resources.MesosResourceConverter.SCALAR;
 import static org.apache.aurora.scheduler.resources.ResourceMapper.PORT_MAPPER;
+import static org.apache.aurora.scheduler.resources.ResourceSettings.ENABLE_REVOCABLE_CPUS;
+import static org.apache.aurora.scheduler.resources.ResourceSettings.ENABLE_REVOCABLE_RAM;
 
 /**
  * Describes Mesos resource types and their Aurora traits.
@@ -55,7 +57,7 @@ public enum ResourceType implements TEnum {
       "core(s)",
       16,
       false,
-      true),
+      ENABLE_REVOCABLE_CPUS.get()),
 
   /**
    * RAM resource.
@@ -70,7 +72,7 @@ public enum ResourceType implements TEnum {
       "MB",
       Amount.of(24, GB).as(MB),
       false,
-      false),
+      ENABLE_REVOCABLE_RAM.get()),
 
   /**
    * DISK resource.


Mime
View raw message