aurora-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From san...@apache.org
Subject svn commit: r1799388 [9/9] - in /aurora/site: data/ publish/ publish/blog/ publish/documentation/0.10.0/ publish/documentation/0.10.0/build-system/ publish/documentation/0.10.0/client-cluster-configuration/ publish/documentation/0.10.0/client-commands/...
Date Wed, 21 Jun 2017 06:29:02 GMT
Modified: aurora/site/source/documentation/latest/development/db-migration.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/development/db-migration.md?rev=1799388&r1=1799387&r2=1799388&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/development/db-migration.md (original)
+++ aurora/site/source/documentation/latest/development/db-migration.md Wed Jun 21 06:28:50
2017
@@ -14,7 +14,7 @@ When adding or altering tables or changi
 [schema.sql](../../src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql),
a new
 migration class should be created under the org.apache.aurora.scheduler.storage.db.migration
 package. The class should implement the [MigrationScript](https://github.com/mybatis/migrations/blob/master/src/main/java/org/apache/ibatis/migration/MigrationScript.java)
-interface (see [V001_TestMigration](https://github.com/apache/aurora/blob/rel/0.17.0/src/test/java/org/apache/aurora/scheduler/storage/db/testmigration/V001_TestMigration.java)
+interface (see [V001_TestMigration](https://github.com/apache/aurora/blob/rel/0.18.0/src/test/java/org/apache/aurora/scheduler/storage/db/testmigration/V001_TestMigration.java)
 as an example). The upgrade and downgrade scripts are defined in this class. When restoring
a
 snapshot the list of migrations on the classpath is compared to the list of applied changes
in the
 DB. Any changes that have not yet been applied are executed and their downgrade script is
stored

Modified: aurora/site/source/documentation/latest/development/design-documents.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/development/design-documents.md?rev=1799388&r1=1799387&r2=1799388&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/development/design-documents.md (original)
+++ aurora/site/source/documentation/latest/development/design-documents.md Wed Jun 21 06:28:50
2017
@@ -18,5 +18,6 @@ Current and past documents:
 * [Supporting the Mesos Universal Containerizer](https://docs.google.com/document/d/111T09NBF2zjjl7HE95xglsDpRdKoZqhCRM5hHmOfTLA/edit?usp=sharing)
 * [Tier Management In Apache Aurora](https://docs.google.com/document/d/1erszT-HsWf1zCIfhbqHlsotHxWUvDyI2xUwNQQQxLgs/edit?usp=sharing)
 * [Ubiquitous Jobs](https://docs.google.com/document/d/12hr6GnUZU3mc7xsWRzMi3nQILGB-3vyUxvbG-6YmvdE/edit)
+* [Pluggable Scheduling](https://docs.google.com/document/d/1fVHLt9AF-YbOCVCDMQmi5DATVusn-tqY8DldKbjVEm0/edit)
 
 Design documents can be found in the Aurora issue tracker via the query [`project = AURORA
AND text ~ "docs.google.com" ORDER BY created`](https://issues.apache.org/jira/browse/AURORA-1528?jql=project%20%3D%20AURORA%20AND%20text%20~%20%22docs.google.com%22%20ORDER%20BY%20created).

Modified: aurora/site/source/documentation/latest/development/thrift.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/development/thrift.md?rev=1799388&r1=1799387&r2=1799388&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/development/thrift.md (original)
+++ aurora/site/source/documentation/latest/development/thrift.md Wed Jun 21 06:28:50 2017
@@ -6,7 +6,7 @@ client/server RPC protocol as well as fo
 correctly handling additions and renames of the existing members, field removals must be
done
 carefully to ensure backwards compatibility and provide predictable deprecation cycle. This
 document describes general guidelines for making Thrift schema changes to the existing fields
in
-[api.thrift](https://github.com/apache/aurora/blob/rel/0.17.0/api/src/main/thrift/org/apache/aurora/gen/api.thrift).
+[api.thrift](https://github.com/apache/aurora/blob/rel/0.18.0/api/src/main/thrift/org/apache/aurora/gen/api.thrift).
 
 It is highly recommended to go through the
 [Thrift: The Missing Guide](http://diwakergupta.github.io/thrift-missing-guide/) first to
refresh on
@@ -33,7 +33,7 @@ communicate with scheduler/client from v
 * Add a new field as an eventual replacement of the old one and implement a dual read/write
 anywhere the old field is used. If a thrift struct is mapped in the DB store make sure both
columns
 are marked as `NOT NULL`
-* Check [storage.thrift](https://github.com/apache/aurora/blob/rel/0.17.0/api/src/main/thrift/org/apache/aurora/gen/storage.thrift)
to see if
+* Check [storage.thrift](https://github.com/apache/aurora/blob/rel/0.18.0/api/src/main/thrift/org/apache/aurora/gen/storage.thrift)
to see if
 the affected struct is stored in Aurora scheduler storage. If so, it's almost certainly also
 necessary to perform a [DB migration](../db-migration/).
 * Add a deprecation jira ticket into the vCurrent+1 release candidate

Modified: aurora/site/source/documentation/latest/features/custom-executors.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/custom-executors.md?rev=1799388&r1=1799387&r2=1799388&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/features/custom-executors.md (original)
+++ aurora/site/source/documentation/latest/features/custom-executors.md Wed Jun 21 06:28:50
2017
@@ -36,6 +36,7 @@ uris (optional)          | List of resou
 shell (optional)         | Run executor via shell.
 
 A note on the command property (from [mesos.proto](https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto)):
+
 ```
 1) If 'shell == true', the command will be launched via shell
    (i.e., /bin/sh -c 'value'). The 'value' specified will be
@@ -68,14 +69,15 @@ scalar (required)    | Value in float fo
 
 ### volume_mounts (list)
 
-Property                  | Description
-------------------------  | ---------------------------------
-host_path (required)      | Host path to mount inside the container.
-container_path (required) | Path inside the container where `host_path` will be mounted.
-mode (required)           | Mode in which to mount the volume, Read-Write (RW) or Read-Only
(RO).
+Property                     | Description
+---------------------------  | ---------------------------------
+host_path (required)         | Host path to mount inside the container.
+container_path (required)    | Path inside the container where `host_path` will be mounted.
+mode (required)              | Mode in which to mount the volume, Read-Write (RW) or Read-Only
(RO).
 
 A sample configuration is as follows:
-```
+
+```json
 [
     {
       "executor": {
@@ -135,7 +137,6 @@ A sample configuration is as follows:
       "task_prefix": "my-executor-"
     }
 ]
-
 ```
 
 It should be noted that if you do not use Thermos or a Thermos based executor, links in the
scheduler's

Modified: aurora/site/source/documentation/latest/features/job-updates.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/job-updates.md?rev=1799388&r1=1799387&r2=1799388&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/features/job-updates.md (original)
+++ aurora/site/source/documentation/latest/features/job-updates.md Wed Jun 21 06:28:50 2017
@@ -37,16 +37,16 @@ in the current (possibly partially-updat
 For a configuration update, the Aurora Scheduler calculates required changes
 by examining the current job config state and the new desired job config.
 It then starts a *rolling batched update process* by going through every batch
-and performing these operations:
+and performing these operations, in order:
 
-- If an instance is present in the scheduler but isn't in the new config,
-  then that instance is killed.
 - If an instance is not present in the scheduler but is present in
   the new config, then the instance is created.
 - If an instance is present in both the scheduler and the new config, then
   the scheduler diffs both task configs. If it detects any changes, it
   performs an instance update by killing the old config instance and adds
   the new config instance.
+- If an instance is present in the scheduler but isn't in the new config,
+  then that instance is killed.
 
 The Aurora Scheduler continues through the instance list until all tasks are
 updated and in `RUNNING`. If the scheduler determines the update is not going
@@ -70,7 +70,7 @@ acknowledging ("heartbeating") job updat
 service updates where explicit job health monitoring is vital during the entire job update
 lifecycle. Such job updates would rely on an external service (or a custom client) periodically
 pulsing an active coordinated job update via a
-[pulseJobUpdate RPC](https://github.com/apache/aurora/blob/rel/0.17.0/api/src/main/thrift/org/apache/aurora/gen/api.thrift).
+[pulseJobUpdate RPC](https://github.com/apache/aurora/blob/rel/0.18.0/api/src/main/thrift/org/apache/aurora/gen/api.thrift).
 
 A coordinated update is defined by setting a positive
 [pulse_interval_secs](../../reference/configuration/#updateconfig-objects) value in job configuration

Modified: aurora/site/source/documentation/latest/features/sla-metrics.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/sla-metrics.md?rev=1799388&r1=1799387&r2=1799388&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/features/sla-metrics.md (original)
+++ aurora/site/source/documentation/latest/features/sla-metrics.md Wed Jun 21 06:28:50 2017
@@ -63,7 +63,7 @@ relevant to uptime calculations. By appl
 transition records, we can build a deterministic downtime trace for every given service instance.
 
 A task going through a state transition carries one of three possible SLA meanings
-(see [SlaAlgorithm.java](https://github.com/apache/aurora/blob/rel/0.17.0/src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java)
for
+(see [SlaAlgorithm.java](https://github.com/apache/aurora/blob/rel/0.18.0/src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java)
for
 sla-to-task-state mapping):
 
 * Task is UP: starts a period where the task is considered to be up and running from the
Aurora
@@ -110,7 +110,7 @@ metric that helps track the dependency o
 * Per job - `sla_<job_key>_mtta_ms`
 * Per cluster - `sla_cluster_mtta_ms`
 * Per instance size (small, medium, large, x-large, xx-large). Size are defined in:
-[ResourceBag.java](https://github.com/apache/aurora/blob/rel/0.17.0/src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java)
+[ResourceBag.java](https://github.com/apache/aurora/blob/rel/0.18.0/src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java)
   * By CPU:
     * `sla_cpu_small_mtta_ms`
     * `sla_cpu_medium_mtta_ms`
@@ -147,7 +147,7 @@ for a task.*
 * Per job - `sla_<job_key>_mtts_ms`
 * Per cluster - `sla_cluster_mtts_ms`
 * Per instance size (small, medium, large, x-large, xx-large). Size are defined in:
-[ResourceBag.java](https://github.com/apache/aurora/blob/rel/0.17.0/src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java)
+[ResourceBag.java](https://github.com/apache/aurora/blob/rel/0.18.0/src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java)
   * By CPU:
     * `sla_cpu_small_mtts_ms`
     * `sla_cpu_medium_mtts_ms`
@@ -182,7 +182,7 @@ reflecting on the overall time it takes
 * Per job - `sla_<job_key>_mttr_ms`
 * Per cluster - `sla_cluster_mttr_ms`
 * Per instance size (small, medium, large, x-large, xx-large). Size are defined in:
-[ResourceBag.java](https://github.com/apache/aurora/blob/rel/0.17.0/src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java)
+[ResourceBag.java](https://github.com/apache/aurora/blob/rel/0.18.0/src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java)
   * By CPU:
     * `sla_cpu_small_mttr_ms`
     * `sla_cpu_medium_mttr_ms`

Modified: aurora/site/source/documentation/latest/features/webhooks.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/webhooks.md?rev=1799388&r1=1799387&r2=1799388&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/features/webhooks.md (original)
+++ aurora/site/source/documentation/latest/features/webhooks.md Wed Jun 21 06:28:50 2017
@@ -19,6 +19,7 @@ Below is a sample configuration:
 ```
 
 And an example of a response that you will get back:
+
 ```json
 {
     "task":
@@ -77,4 +78,3 @@ And an example of a response that you wi
         },
         "oldState":{}}
 ```
-

Modified: aurora/site/source/documentation/latest/index.html.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/index.html.md?rev=1799388&r1=1799387&r2=1799388&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/index.html.md (original)
+++ aurora/site/source/documentation/latest/index.html.md Wed Jun 21 06:28:50 2017
@@ -35,6 +35,8 @@ For those that wish to manage and fine-t
 
  * [Installation](operations/installation/)
  * [Configuration](operations/configuration/)
+ * [Upgrades](operations/upgrades/)
+ * [Troubleshooting](operations/troubleshooting/)
  * [Monitoring](operations/monitoring/)
  * [Security](operations/security/)
  * [Storage](operations/storage/)
@@ -54,6 +56,8 @@ The complete reference of commands, conf
     - [Client Hooks](reference/client-hooks/)
     - [Client Cluster Configuration](reference/client-cluster-configuration/)
  * [Scheduler Configuration](reference/scheduler-configuration/)
+ * [Observer Configuration](reference/observer-configuration/)
+ * [Endpoints](reference/scheduler-endpoints/)
 
 ## Additional Resources
  * [Tools integrating with Aurora](additional-resources/tools/)

Modified: aurora/site/source/documentation/latest/operations/backup-restore.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/operations/backup-restore.md?rev=1799388&r1=1799387&r2=1799388&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/operations/backup-restore.md (original)
+++ aurora/site/source/documentation/latest/operations/backup-restore.md Wed Jun 21 06:28:50
2017
@@ -3,7 +3,7 @@
 **Be sure to read the entire page before attempting to restore from a backup, as it may have
 unintended consequences.**
 
-# Summary
+## Summary
 
 The restoration procedure replaces the existing (possibly corrupted) Mesos replicated log
with an
 earlier, backed up, version and requires all schedulers to be taken down temporarily while
@@ -18,7 +18,7 @@ so any tasks that have been rescheduled
 Instructions below have been verified in [Vagrant environment](../../getting-started/vagrant/)
and with minor
 syntax/path changes should be applicable to any Aurora cluster.
 
-# Preparation
+## Preparation
 
 Follow these steps to prepare the cluster for restoring from a backup:
 
@@ -54,7 +54,7 @@ accomplished by updating the following s
 
 * Restart all schedulers
 
-# Cleanup and re-initialize Mesos replicated log
+## Cleanup and re-initialize Mesos replicated log
 
 Get rid of the corrupted files and re-initialize Mesos replicated log:
 
@@ -63,7 +63,7 @@ Get rid of the corrupted files and re-in
 * Initialize Mesos replica's log file: `sudo mesos-log initialize --path=<-native_log_file_path>`
 * Start schedulers
 
-# Restore from backup
+## Restore from backup
 
 At this point the scheduler is ready to rehydrate from the backup:
 
@@ -87,5 +87,5 @@ See `aurora_admin help <command>` for us
 the provided backup snapshot and initiate a mandatory failover
 `aurora_admin scheduler_commit_recovery --bypass-leader-redirect  <cluster>`
 
-# Cleanup
+## Cleanup
 Undo any modification done during [Preparation](#preparation) sequence.

Modified: aurora/site/source/documentation/latest/operations/configuration.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/operations/configuration.md?rev=1799388&r1=1799387&r2=1799388&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/operations/configuration.md (original)
+++ aurora/site/source/documentation/latest/operations/configuration.md Wed Jun 21 06:28:50
2017
@@ -29,7 +29,6 @@ Like Mesos, Aurora uses command-line fla
     # Environment variables controlling libmesos
     export JAVA_HOME=...
     export GLOG_v=1
-    # Port and public ip used to communicate with the Mesos master and for the replicated
log
     export LIBPROCESS_PORT=8083
     export LIBPROCESS_IP=192.168.33.7
 
@@ -38,6 +37,36 @@ Like Mesos, Aurora uses command-line fla
 That way Aurora's current flags are visible in `ps` and in the `/vars` admin endpoint.
 
 
+## JVM Configuration
+
+JVM settings are dependent on your environment and cluster size. They might require
+custom tuning. As a starting point, we recommend:
+
+* Ensure the initial (`-Xms`) and maximum (`-Xmx`) heap size are idential to prevent heap
resizing
+  at runtime.
+* Either `-XX:+UseConcMarkSweepGC` or `-XX:+UseG1GC -XX:+UseStringDeduplication` are
+  sane defaults for the garbage collector.
+* `-Djava.net.preferIPv4Stack=true` makes sense in most cases as well.
+
+
+## Network Configuration
+
+By default, Aurora binds to all interfaces and auto-discovers its hostname. To reduce ambiguity
+it helps to hardcode them though:
+
+    -http_port=8081
+    -ip=192.168.33.7
+    -hostname="aurora1.us-east1.example.org"
+
+Two environment variables control the ip and port for the communication with the Mesos master
+and for the replicated log used by Aurora:
+
+    export LIBPROCESS_PORT=8083
+    export LIBPROCESS_IP=192.168.33.7
+
+It is important that those can be reached from all Mesos master and Aurora scheduler instances.
+
+
 ## Replicated Log Configuration
 
 Aurora schedulers use ZooKeeper to discover log replicas and elect a leader. Only one scheduler
is
@@ -64,13 +93,18 @@ should be set to `3`.
 *Incorrectly setting this flag will cause data corruption to occur!*
 
 ### `-native_log_file_path`
-Location of the Mesos replicated log files. Consider allocating a dedicated disk (preferably
SSD)
-for Mesos replicated log files to ensure optimal storage performance.
+Location of the Mesos replicated log files. For optimal and consistent performance, consider
+allocating a dedicated disk (preferably SSD) for the replicated log. Ensure that this disk
is not
+used by anything else (e.g. no process logging) and in particular that it is a real disk
+and not just a partition.
+
+Even when a dedicated disk is used, switching from `CFQ` to `deadline` I/O scheduler of Linux
kernel
+can furthermore help with storage performance in Aurora ([see this ticket for details](https://issues.apache.org/jira/browse/AURORA-1211)).
 
 ### `-native_log_zk_group_path`
 ZooKeeper path used for Mesos replicated log quorum discovery.
 
-See [code](https://github.com/apache/aurora/blob/rel/0.17.0/src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java)
for
+See [code](https://github.com/apache/aurora/blob/rel/0.18.0/src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java)
for
 other available Mesos replicated log configuration options and default values.
 
 ### Changing the Quorum Size
@@ -91,8 +125,10 @@ or truncating of the replicated log used
 
 Configuration options for the Aurora scheduler backup manager.
 
-* `-backup_interval`: The interval on which the scheduler writes local storage backups. 
The default is every hour.
-* `-backup_dir`: Directory to write backups to.
+* `-backup_interval`: The interval on which the scheduler writes local storage backups.
+   The default is every hour.
+* `-backup_dir`: Directory to write backups to. As stated above, this should not be co-located
on the
+   same disk as the replicated log.
 * `-max_saved_backups`: Maximum number of backups to retain before deleting the oldest backup(s).
 
 
@@ -131,12 +167,29 @@ the latter needs to be enabled via:
 
     -enable_revocable_ram=true
 
-Unless you want to use the [default](https://github.com/apache/aurora/blob/rel/0.17.0/src/main/resources/org/apache/aurora/scheduler/tiers.json)
+Unless you want to use the [default](https://github.com/apache/aurora/blob/rel/0.18.0/src/main/resources/org/apache/aurora/scheduler/tiers.json)
 tier configuration, you will also have to specify a file path:
 
     -tier_config=path/to/tiers/config.json
 
 
+## Multi-Framework Setup
+
+Aurora holds onto Mesos offers in order to provide efficient scheduling and
+[preemption](../../features/multitenancy/#preemption). This is problematic in multi-framework
+environments as Aurora might starve other frameworks.
+
+With a downside of increased scheduling latency, Aurora can be configured to be more cooperative:
+
+* Lowering `-min_offer_hold_time` (e.g. to `1mins`) can ensure unused offers are returned
back to
+  Mesos more frequently.
+* Increasing `-offer_filter_duration` (e.g to `30secs`) will instruct Mesos
+  not to re-offer rejected resources for the given duration.
+
+Setting a [minimum amount of resources](http://mesos.apache.org/documentation/latest/quota/)
for
+each Mesos role can furthermore help to ensure no framework is starved entirely.
+
+
 ## Containers
 
 Both the Mesos and Docker containerizers require configuration of the Mesos agent.
@@ -249,3 +302,38 @@ Increasing executor overhead on an exist
 will result in degraded preemption performance until all task which began life with the previous
 executor configuration with less overhead are preempted/restarted.
 
+## Controlling MTTA via Update Affinity
+
+When there is high resource contention in your cluster you may experience noticably elevated
job update
+times, as well as high task churn across the cluster. This is due to Aurora's first-fit scheduling
+algorithm. To alleviate this, you can enable update affinity where the Scheduler will make
a best-effort
+attempt to reuse the same agent for the updated task (so long as the resources for the job
are not being
+increased).
+
+To enable this in the Scheduler, you can set the following options:
+
+    --enable_update_affinity=true
+    --update_affinity_reservation_hold_time=3mins
+
+You will need to tune the hold time to match the behavior you see in your cluster. If you
have extremely
+high update throughput, you might have to extend it as processing updates could easily add
significant
+delays between scheduling attempts. You may also have to tune scheduling parameters to achieve
the
+throughput you need in your cluster. Some relevant settings (with defaults) are:
+
+    --max_schedule_attempts_per_sec=40
+    --initial_schedule_penalty=1secs
+    --max_schedule_penalty=1mins
+    --scheduling_max_batch_size=3
+    --max_tasks_per_schedule_attempt=5
+
+There are metrics exposed by the Scheduler which can provide guidance on where the bottleneck
is.
+Example metrics to look at:
+
+    - schedule_attempts_blocks (if this number is greater than 0, then task throughput is
hitting
+                                limits controlled by --max_scheduler_attempts_per_sec)
+    - scheduled_task_penalty_* (metrics around scheduling penalties for tasks, if the numbers
here are high
+                                then you could have high contention for resources)
+
+Most likely you'll run into limits with the number of update instances that can be processed
per minute
+before you run into any other limits. So if your total work done per minute starts to exceed
2k instances,
+you may need to extend the update_affinity_reservation_hold_time.

Modified: aurora/site/source/documentation/latest/operations/installation.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/operations/installation.md?rev=1799388&r1=1799387&r2=1799388&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/operations/installation.md (original)
+++ aurora/site/source/documentation/latest/operations/installation.md Wed Jun 21 06:28:50
2017
@@ -26,6 +26,8 @@ profiles:
 A small number of machines (typically 3 or 5) responsible for cluster orchestration.  In
most cases
 it is fine to co-locate these components in anything but very large clusters (> 1000 machines).
 Beyond that point, operators will likely want to manage these services on separate machines.
+In particular, you will want to use separate ZooKeeper ensembles for leader election and
+service discovery. Otherwise a service discovery error or outage can take down the entire
cluster.
 
 In practice, 5 coordinators have been shown to reliably manage clusters with tens of thousands
of
 machines.
@@ -140,7 +142,7 @@ CentOS: `sudo systemctl start aurora`
         wget -c https://apache.bintray.com/aurora/centos-7/aurora-executor-0.17.0-1.el7.centos.aurora.x86_64.rpm
         sudo yum install -y aurora-executor-0.17.0-1.el7.centos.aurora.x86_64.rpm
 
-### Configuration
+### Worker Configuration
 The executor typically does not require configuration.  Command line arguments can
 be passed to the executor using a command line argument on the scheduler.
 
@@ -194,6 +196,7 @@ Make an edit to add the `--mesos-root` f
       --log_to_stderr=google:INFO
     )
 
+
 ## Installing the client
 ### Ubuntu Trusty
 
@@ -214,7 +217,7 @@ Make an edit to add the `--mesos-root` f
     brew upgrade
     brew install aurora-cli
 
-### Configuration
+### Client Configuration
 Client configuration lives in a json file that describes the clusters available and how to
reach
 them.  By default this file is at `/etc/aurora/clusters.json`.
 
@@ -247,66 +250,7 @@ are identical for both.
     sudo yum -y install mesos-1.1.0
 
 
-
 ## Troubleshooting
-So you've started your first cluster and are running into some issues? We've collected some
common
-stumbling blocks and solutions here to help get you moving.
-
-### Replicated log not initialized
-
-#### Symptoms
-- Scheduler RPCs and web interface claim `Storage is not READY`
-- Scheduler log repeatedly prints messages like
-
-  ```
-  I1016 16:12:27.234133 26081 replica.cpp:638] Replica in EMPTY status
-  received a broadcasted recover request
-  I1016 16:12:27.234256 26084 recover.cpp:188] Received a recover response
-  from a replica in EMPTY status
-  ```
-
-#### Solution
-When you create a new cluster, you need to inform a quorum of schedulers that they are safe
to
-consider their database to be empty by [initializing](#finalizing) the
-replicated log. This is done to prevent the scheduler from modifying the cluster state in
the event
-of multiple simultaneous disk failures or, more likely, misconfiguration of the replicated
log path.
-
-
-### Scheduler not registered
-
-#### Symptoms
-Scheduler log contains
-
-    Framework has not been registered within the tolerated delay.
-
-#### Solution
-Double-check that the scheduler is configured correctly to reach the Mesos master. If you
are registering
-the master in ZooKeeper, make sure command line argument to the master:
 
-    --zk=zk://$ZK_HOST:2181/mesos/master
-
-is the same as the one on the scheduler:
-
-    -mesos_master_address=zk://$ZK_HOST:2181/mesos/master
-
-
-### Scheduler not running
-
-### Symptom
-The scheduler process commits suicide regularly. This happens under error conditions, but
-also on purpose in regular intervals.
-
-## Solution
-Aurora is meant to be run under supervision. You have to configure a supervisor like
-[Monit](http://mmonit.com/monit/) or [supervisord](http://supervisord.org/) to run the scheduler
-and restart it whenever it fails or exists on purpose.
-
-Aurora supports an active health checking protocol on its admin HTTP interface - if a `GET
/health`
-times out or returns anything other than `200 OK` the scheduler process is unhealthy and
should be
-restarted.
-
-For example, monit can be configured with
-
-    if failed port 8081 send "GET /health HTTP/1.0\r\n" expect "OK\n" with timeout 2 seconds
for 10 cycles then restart
-
-assuming you set `-http_port=8081`.
+So you've started your first cluster and are running into some issues? We've collected some
common
+stumbling blocks and solutions in our [Troubleshooting guide](../troubleshooting/) to help
get you moving.

Modified: aurora/site/source/documentation/latest/operations/storage.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/operations/storage.md?rev=1799388&r1=1799387&r2=1799388&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/operations/storage.md (original)
+++ aurora/site/source/documentation/latest/operations/storage.md Wed Jun 21 06:28:50 2017
@@ -1,8 +1,6 @@
 # Aurora Scheduler Storage
 
 - [Overview](#overview)
-- [Replicated Log Configuration](#replicated-log-configuration)
-- [Backup Configuration](#replicated-log-configuration)
 - [Storage Semantics](#storage-semantics)
   - [Reads, writes, modifications](#reads-writes-modifications)
     - [Read lifecycle](#read-lifecycle)
@@ -21,8 +19,9 @@ For example:
 * Production resource quotas
 * Mesos resource offer host attributes
 
-Aurora solves its persistence needs by leveraging the Mesos implementation of a Paxos replicated
-log [[1]](https://ramcloud.stanford.edu/~ongaro/userstudy/paxos.pdf)
+Aurora solves its persistence needs by leveraging the
+[Mesos implementation of a Paxos replicated log](http://mesos.apache.org/documentation/latest/replicated-log-internals/)
+[[1]](https://ramcloud.stanford.edu/~ongaro/userstudy/paxos.pdf)
 [[2]](http://en.wikipedia.org/wiki/State_machine_replication) with a key-value
 [LevelDB](https://github.com/google/leveldb) storage as persistence media.
 

Modified: aurora/site/source/documentation/latest/reference/client-commands.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/reference/client-commands.md?rev=1799388&r1=1799387&r2=1799388&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/reference/client-commands.md (original)
+++ aurora/site/source/documentation/latest/reference/client-commands.md Wed Jun 21 06:28:50
2017
@@ -25,6 +25,7 @@ Aurora Client Commands
     - [Getting Job Status](#getting-job-status)
     - [Opening the Web UI](#opening-the-web-ui)
     - [SSHing to a Specific Task Machine](#sshing-to-a-specific-task-machine)
+    - [SCPing with Specific Task Machines](#scping-with-specific-task-machines)
     - [Templating Command Arguments](#templating-command-arguments)
 
 Introduction
@@ -299,6 +300,18 @@ assigned a particular Job/shard number.
 diagnosing issues such as performance issues or abnormal behavior on a
 particular machine.
 
+### SCPing with Specific Task Machines
+
+    aurora task scp [<cluster>/<role>/<env>/<job_name>/<instance_id>]:source
[<cluster>/<role>/<env>/<job_name>/<instance_id>]:dest
+
+You can have the Aurora client copy file(s)/folder(s) to, from, and between
+individual tasks. The sandbox folder serves as the relative root and is the
+same folder you see when you browse `chroot` from the Scheduler task UI. You
+can also use absolute paths (like for `/tmp`), but tilde expansion is not
+supported. Currently, this command is only fully supported for Mesos
+containers. Users may use this to copy files from Docker containers but they
+cannot copy files to them.
+
 ### Templating Command Arguments
 
     aurora task run [-e] [-t THREADS] <job_key> -- <<command-line>>

Modified: aurora/site/source/documentation/latest/reference/configuration.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/reference/configuration.md?rev=1799388&r1=1799387&r2=1799388&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/reference/configuration.md (original)
+++ aurora/site/source/documentation/latest/reference/configuration.md Wed Jun 21 06:28:50
2017
@@ -468,6 +468,15 @@ unified-container, the container can be
   param            | type                           | description
   -----            | :----:                         | -----------
   ```image```      | Choice(AppcImage, DockerImage) | An optional filesystem image to use
within this container.
+  ```volumes```    | List(Volume)                   | An optional list of volume mounts for
this container.
+
+### Volume Object
+
+  param                  | type     | description
+  -----                  | :----:   | -----------
+  ```container_path```   | String   | Path on the host to mount.
+  ```volume_path```      | String   | Mount point in the container.
+  ```mode```             | Enum     | Mode of the mount, can be 'RW' or 'RO'.
 
 ### AppcImage
 

Modified: aurora/site/source/documentation/latest/reference/scheduler-configuration.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/reference/scheduler-configuration.md?rev=1799388&r1=1799387&r2=1799388&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/reference/scheduler-configuration.md (original)
+++ aurora/site/source/documentation/latest/reference/scheduler-configuration.md Wed Jun 21
06:28:50 2017
@@ -190,6 +190,8 @@ Optional flags:
 	If true, Aurora populates DiscoveryInfo field of Mesos TaskInfo.
 -preemption_delay (default (3, mins))
 	Time interval after which a pending task becomes eligible to preempt other tasks
+-preemption_slot_finder_modules (default [class org.apache.aurora.scheduler.preemptor.PendingTaskProcessorModule,
class org.apache.aurora.scheduler.preemptor.PreemptionVictimFilterModule])
+  Guice modules for replacing preemption logic.
 -preemption_slot_hold_time (default (5, mins))
 	Time to hold a preemption slot found before it is discarded.
 -preemption_slot_search_interval (default (1, mins))
@@ -234,6 +236,8 @@ Optional flags:
 	Time for a stat to be retained in memory before expiring.
 -stat_sampling_interval (default (1, secs))
 	Statistic value sampling interval.
+-task_assigner_modules (default [class org.apache.aurora.scheduler.state.FirstFitTaskAssignerModule])
+  Guice modules for replacing task assignment logic.
 -thermos_executor_cpu (default 0.25)
 	The number of CPU cores to allocate for each instance of the executor.
 -thermos_executor_flags

Modified: aurora/site/source/documentation/latest/reference/scheduler-endpoints.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/reference/scheduler-endpoints.md?rev=1799388&r1=1799387&r2=1799388&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/reference/scheduler-endpoints.md (original)
+++ aurora/site/source/documentation/latest/reference/scheduler-endpoints.md Wed Jun 21 06:28:50
2017
@@ -1,7 +1,7 @@
 # HTTP endpoints
 
 There are a number of HTTP endpoints that the Aurora scheduler exposes. These allow various
-operational tasks to be performed on the scheduler. Below is the list of all such endpoints
+operational tasks to be performed on the scheduler. Below is an (incomplete) list of such
endpoints
 and a brief explanation of what they do.
 
 ## Leader health
@@ -12,8 +12,8 @@ HAProxy or AWS ELB.
 When a HTTP GET request is issued on this endpoint, it responds as follows:
 
 - If the instance that received the GET request is the leading scheduler, a HTTP status code
of
-  200 (OK) is returned.
+  `200 OK` is returned.
 - If the instance that received the GET request is not the leading scheduler but a leader
does
-  exist, a HTTP status code of 503 (SERVICE_UNAVAILABLE) is returned.
-- If no leader currently exists or the leader is unknown, a HTTP status code of 502
-  (BAD_GATEWAY) is returned.
\ No newline at end of file
+  exist, a HTTP status code of `503 SERVICE_UNAVAILABLE` is returned.
+- If no leader currently exists or the leader is unknown, a HTTP status code of `502 BAD_GATEWAY`
+  is returned.



Mime
View raw message