aurora-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wfar...@apache.org
Subject aurora git commit: Replace incorrect/misleading use of constraints with best practices doc.
Date Fri, 11 Sep 2015 18:00:51 GMT
Repository: aurora
Updated Branches:
  refs/heads/master 140d74d65 -> 7c7dcb265


Replace incorrect/misleading use of constraints with best practices doc.

Reviewed at https://reviews.apache.org/r/38302/


Project: http://git-wip-us.apache.org/repos/asf/aurora/repo
Commit: http://git-wip-us.apache.org/repos/asf/aurora/commit/7c7dcb26
Tree: http://git-wip-us.apache.org/repos/asf/aurora/tree/7c7dcb26
Diff: http://git-wip-us.apache.org/repos/asf/aurora/diff/7c7dcb26

Branch: refs/heads/master
Commit: 7c7dcb26593baf9e5941de40e34d4aa4fe1ab95c
Parents: 140d74d
Author: Bill Farner <wfarner@apache.org>
Authored: Fri Sep 11 11:00:30 2015 -0700
Committer: Bill Farner <wfarner@apache.org>
Committed: Fri Sep 11 11:00:44 2015 -0700

----------------------------------------------------------------------
 docs/deploying-aurora-scheduler.md              | 33 ++++++++++----------
 examples/vagrant/upstart/mesos-slave.conf       |  1 -
 .../apache/aurora/e2e/http/http_example.aurora  |  4 ---
 .../aurora/e2e/http/http_example_updated.aurora |  9 ++----
 4 files changed, 18 insertions(+), 29 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/aurora/blob/7c7dcb26/docs/deploying-aurora-scheduler.md
----------------------------------------------------------------------
diff --git a/docs/deploying-aurora-scheduler.md b/docs/deploying-aurora-scheduler.md
index 73f7b19..8db0e61 100644
--- a/docs/deploying-aurora-scheduler.md
+++ b/docs/deploying-aurora-scheduler.md
@@ -21,6 +21,8 @@ machines.  This guide helps you get the scheduler set up and troubleshoot
some c
     - [Dedicated attribute](#dedicated-attribute)
       - [Syntax](#syntax)
       - [Example](#example)
+- [Best practices](#best-practices)
+  - [Diversity](#diversity)
 - [Common problems](#common-problems)
   - [Replicated log not initialized](#replicated-log-not-initialized)
     - [Symptoms](#symptoms)
@@ -28,9 +30,6 @@ machines.  This guide helps you get the scheduler set up and troubleshoot
some c
   - [Scheduler not registered](#scheduler-not-registered)
     - [Symptoms](#symptoms-1)
     - [Solution](#solution-1)
-  - [Tasks are stuck in PENDING forever](#tasks-are-stuck-in-pending-forever)
-    - [Symptoms](#symptoms-2)
-    - [Solution](#solution-2)
 - [Changing Scheduler Quorum Size](#changing-scheduler-quorum-size)
     - [Preparation](#preparation)
     - [Adding New Schedulers](#adding-new-schedulers)
@@ -220,7 +219,7 @@ enforce this.
 ##### Example
 Consider the following slave command line:
 
-    mesos-slave --attributes="host:$HOST;rack:$RACK;dedicated:db_team/redis" ...
+    mesos-slave --attributes="dedicated:db_team/redis" ...
 
 And this job configuration:
 
@@ -237,6 +236,19 @@ The job configuration is indicating that it should only be scheduled
on slaves w
 `dedicated:db_team/redis`.  Additionally, Aurora will prevent any tasks that do _not_ have
that
 constraint from running on those slaves.
 
+## Best practices
+### Diversity
+Data centers are often organized with hierarchical failure domains.  Common failure domains
+include hosts, racks, rows, and PDUs.  If you have this information available, it is wise
to tag
+the mesos-slave with them as
+[attributes](https://mesos.apache.org/documentation/attributes-resources/).
+
+When it comes time to schedule jobs, Aurora will automatically spread them across the failure
+domains as specified in the
+[job configuration](configuration-reference.md#specifying-scheduling-constraints).
+
+Note: in virtualized environments like EC2, the only attribute that usually makes sense for
this
+purpose is `host`.
 
 ## Common problems
 So you've started your first cluster and are running into some issues? We've collected some
common
@@ -278,19 +290,6 @@ is the same as the one on the scheduler:
 
     -mesos_master_address=zk://$ZK_HOST:2181/mesos/master
 
-### Tasks are stuck in `PENDING` forever
-
-#### Symptoms
-The scheduler is registered, and [receiving offers](monitoring.md#scheduler_resource_offers),
-but tasks are perpetually shown as `PENDING - Constraint not satisfied: host`.
-
-#### Solution
-Check that your slaves are configured with `host` and `rack` attributes.  Aurora requires
that
-slaves are tagged with these two common failure domains to ensure that it can safely place
tasks
-such that jobs are resilient to failure.
-
-See our [vagrant example](examples/vagrant/upstart/mesos-slave.conf) for details.
-
 ## Changing Scheduler Quorum Size
 Special care needs to be taken when changing the size of the Aurora scheduler quorum.
 Since Aurora uses a Mesos replicated log, similar steps need to be followed as when

http://git-wip-us.apache.org/repos/asf/aurora/blob/7c7dcb26/examples/vagrant/upstart/mesos-slave.conf
----------------------------------------------------------------------
diff --git a/examples/vagrant/upstart/mesos-slave.conf b/examples/vagrant/upstart/mesos-slave.conf
index 9af680e..1ef059b 100644
--- a/examples/vagrant/upstart/mesos-slave.conf
+++ b/examples/vagrant/upstart/mesos-slave.conf
@@ -26,7 +26,6 @@ env ZK_HOST=192.168.33.7
 exec /usr/sbin/mesos-slave --master=zk://$ZK_HOST:2181/mesos/master \
   --ip=$MY_HOST \
   --hostname=$MY_HOST \
-  --attributes="host:$MY_HOST;rack:a" \
   --resources="cpus:4;mem:1024;disk:20000" \
   --work_dir="/var/lib/mesos" \
   --containerizers=docker,mesos \

http://git-wip-us.apache.org/repos/asf/aurora/blob/7c7dcb26/src/test/sh/org/apache/aurora/e2e/http/http_example.aurora
----------------------------------------------------------------------
diff --git a/src/test/sh/org/apache/aurora/e2e/http/http_example.aurora b/src/test/sh/org/apache/aurora/e2e/http/http_example.aurora
index d7bf108..dc55109 100644
--- a/src/test/sh/org/apache/aurora/e2e/http/http_example.aurora
+++ b/src/test/sh/org/apache/aurora/e2e/http/http_example.aurora
@@ -43,10 +43,6 @@ job = Service(
   role = getpass.getuser(),
   environment = 'test',
   contact = '{{role}}@localhost',
-  # Since there is only one slave in devcluster allow all instances to run there.
-  constraints = {
-    'host': 'limit:2',
-  },
   announce = Announcer(),
 )
 

http://git-wip-us.apache.org/repos/asf/aurora/blob/7c7dcb26/src/test/sh/org/apache/aurora/e2e/http/http_example_updated.aurora
----------------------------------------------------------------------
diff --git a/src/test/sh/org/apache/aurora/e2e/http/http_example_updated.aurora b/src/test/sh/org/apache/aurora/e2e/http/http_example_updated.aurora
index c973966..f098de9 100644
--- a/src/test/sh/org/apache/aurora/e2e/http/http_example_updated.aurora
+++ b/src/test/sh/org/apache/aurora/e2e/http/http_example_updated.aurora
@@ -25,11 +25,10 @@ stage_server = Process(
   cmdline = '{{cmd}}'
 )
 
-test_task = Task(
+test_task = SequentialTask(
   name = 'http_example',
   resources = Resources(cpu=0.5, ram=34*MB, disk=64*MB),
-  processes = [stage_server, run_server],
-  constraints = order(stage_server, run_server))
+  processes = [stage_server, run_server])
 
 update_config = UpdateConfig(watch_secs=10, batch_size=3)
 health_check_config = HealthCheckConfig(initial_interval_secs=5, interval_secs=1)
@@ -43,10 +42,6 @@ job = Service(
   role = getpass.getuser(),
   environment = 'test',
   contact = '{{role}}@localhost',
-  # Since there is only one slave in devcluster allow all instances to run there.
-  constraints = {
-    'host': 'limit:4',
-  },
   announce = Announcer(),
 )
 


Mime
View raw message