aurora-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wfar...@apache.org
Subject svn commit: r1719755 [3/4] - in /aurora/site/source/documentation/latest: ./ images/ images/presentations/
Date Sun, 13 Dec 2015 00:06:37 GMT
Added: aurora/site/source/documentation/latest/images/debugging-client-test.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/images/debugging-client-test.png?rev=1719755&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/latest/images/debugging-client-test.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/latest/images/killedtask.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/images/killedtask.png?rev=1719755&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/latest/images/killedtask.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/latest/images/lifeofatask.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/images/lifeofatask.png?rev=1719755&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/latest/images/lifeofatask.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/latest/images/presentations/02_19_2015_aurora_adopters_panel_thumb.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/images/presentations/02_19_2015_aurora_adopters_panel_thumb.png?rev=1719755&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/latest/images/presentations/02_19_2015_aurora_adopters_panel_thumb.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/latest/images/presentations/02_19_2015_aurora_at_tellapart_thumb.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/images/presentations/02_19_2015_aurora_at_tellapart_thumb.png?rev=1719755&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/latest/images/presentations/02_19_2015_aurora_at_tellapart_thumb.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/latest/images/presentations/02_19_2015_aurora_at_twitter_thumb.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/images/presentations/02_19_2015_aurora_at_twitter_thumb.png?rev=1719755&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/latest/images/presentations/02_19_2015_aurora_at_twitter_thumb.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/latest/images/presentations/02_28_2015_apache_aurora_thumb.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/images/presentations/02_28_2015_apache_aurora_thumb.png?rev=1719755&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/latest/images/presentations/02_28_2015_apache_aurora_thumb.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/latest/images/presentations/03_25_2014_introduction_to_aurora_thumb.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/images/presentations/03_25_2014_introduction_to_aurora_thumb.png?rev=1719755&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/latest/images/presentations/03_25_2014_introduction_to_aurora_thumb.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/latest/images/presentations/04_30_2015_monolith_to_microservices_thumb.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/images/presentations/04_30_2015_monolith_to_microservices_thumb.png?rev=1719755&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/latest/images/presentations/04_30_2015_monolith_to_microservices_thumb.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/latest/images/presentations/08_21_2014_past_present_future_thumb.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/images/presentations/08_21_2014_past_present_future_thumb.png?rev=1719755&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/latest/images/presentations/08_21_2014_past_present_future_thumb.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/latest/images/runningtask.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/images/runningtask.png?rev=1719755&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/latest/images/runningtask.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/latest/images/stderr.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/images/stderr.png?rev=1719755&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/latest/images/stderr.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/latest/images/stdout.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/images/stdout.png?rev=1719755&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/latest/images/stdout.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/latest/images/storage_hierarchy.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/images/storage_hierarchy.png?rev=1719755&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/latest/images/storage_hierarchy.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/latest/index.html.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/index.html.md?rev=1719755&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/index.html.md (added)
+++ aurora/site/source/documentation/latest/index.html.md Sun Dec 13 00:06:36 2015
@@ -0,0 +1,37 @@
+## Introduction
+Apache Aurora is a service scheduler that runs on top of Apache Mesos, enabling you to run long-running services that take advantage of Apache Mesos' scalability, fault-tolerance, and resource isolation. This documentation has been organized into sections with three audiences in mind:
+ 
+ * Users: General information about the project and to learn how to run an Aurora job.
+ * Operators: For those that wish to manage and fine-tune an Aurora cluster.
+ * Developers: All the information you need to start modifying Aurora and contributing back to the project.
+
+We encourage you to ask questions on the [Aurora developer list](http://aurora.apache.org/community/) or the `#aurora` IRC channel on `irc.freenode.net`.
+
+## Users
+ * [Install Aurora on virtual machines on your private machine](/documentation/latest//)
+ * [Hello World Tutorial](/documentation/latest//)
+ * [User Guide](/documentation/latest//)
+ * [Configuration Tutorial](/documentation/latest//)
+ * [Aurora + Thermos Reference](/documentation/latest//)
+ * [Command Line Client](/documentation/latest//)
+ * [Cron Jobs](/documentation/latest//)
+
+## Operators
+ * [Deploy Aurora](/documentation/latest//)
+ * [Monitoring](/documentation/latest//)
+ * [Hooks for Aurora Client API](/documentation/latest//)
+ * [Scheduler Storage](/documentation/latest//)
+ * [Scheduler Storage and Maintenance](/documentation/latest//)
+ * [SLA Measurement](/documentation/latest//)
+ * [Resource Isolation and Sizing](/documentation/latest//)
+ * [Generating test resources](/documentation/latest//)
+
+## Developers
+ * [Contributing to the project](contributing/)
+ * [Developing the Aurora Scheduler](/documentation/latest//)
+ * [Developing the Aurora Client](/documentation/latest//)
+ * [Committers Guide](/documentation/latest//)
+ * [Build System](/documentation/latest//)
+ 
+## Additional Resources
+ * [Presentation videos and slides](/documentation/latest//)

Added: aurora/site/source/documentation/latest/monitoring.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/monitoring.md?rev=1719755&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/monitoring.md (added)
+++ aurora/site/source/documentation/latest/monitoring.md Sun Dec 13 00:06:36 2015
@@ -0,0 +1,181 @@
+# Monitoring your Aurora cluster
+
+Before you start running important services in your Aurora cluster, it's important to set up
+monitoring and alerting of Aurora itself.  Most of your monitoring can be against the scheduler,
+since it will give you a global view of what's going on.
+
+## Reading stats
+The scheduler exposes a *lot* of instrumentation data via its HTTP interface. You can get a quick
+peek at the first few of these in our vagrant image:
+
+    $ vagrant ssh -c 'curl -s localhost:8081/vars | head'
+    async_tasks_completed 1004
+    attribute_store_fetch_all_events 15
+    attribute_store_fetch_all_events_per_sec 0.0
+    attribute_store_fetch_all_nanos_per_event 0.0
+    attribute_store_fetch_all_nanos_total 3048285
+    attribute_store_fetch_all_nanos_total_per_sec 0.0
+    attribute_store_fetch_one_events 3391
+    attribute_store_fetch_one_events_per_sec 0.0
+    attribute_store_fetch_one_nanos_per_event 0.0
+    attribute_store_fetch_one_nanos_total 454690753
+
+These values are served as `Content-Type: text/plain`, with each line containing a space-separated metric
+name and value. Values may be integers, doubles, or strings (note: strings are static, others
+may be dynamic).
+
+If your monitoring infrastructure prefers JSON, the scheduler exports that as well:
+
+    $ vagrant ssh -c 'curl -s localhost:8081/vars.json | python -mjson.tool | head'
+    {
+        "async_tasks_completed": 1009,
+        "attribute_store_fetch_all_events": 15,
+        "attribute_store_fetch_all_events_per_sec": 0.0,
+        "attribute_store_fetch_all_nanos_per_event": 0.0,
+        "attribute_store_fetch_all_nanos_total": 3048285,
+        "attribute_store_fetch_all_nanos_total_per_sec": 0.0,
+        "attribute_store_fetch_one_events": 3409,
+        "attribute_store_fetch_one_events_per_sec": 0.0,
+        "attribute_store_fetch_one_nanos_per_event": 0.0,
+
+This will be the same data as above, served with `Content-Type: application/json`.
+
+## Viewing live stat samples on the scheduler
+The scheduler uses the Twitter commons stats library, which keeps an internal time-series database
+of exported variables - nearly everything in `/vars` is available for instant graphing.  This is
+useful for debugging, but is not a replacement for an external monitoring system.
+
+You can view these graphs on a scheduler at `/graphview`.  It supports some composition and
+aggregation of values, which can be invaluable when triaging a problem.  For example, if you have
+the scheduler running in vagrant, check out these links:
+[simple graph](http://192.168.33.7:8081/graphview?query=jvm_uptime_secs)
+[complex composition](http://192.168.33.7:8081/graphview?query=rate\(scheduler_log_native_append_nanos_total\)%2Frate\(scheduler_log_native_append_events\)%2F1e6)
+
+### Counters and gauges
+Among numeric stats, there are two fundamental types of stats exported: _counters_ and _gauges_.
+Counters are guaranteed to be monotonically-increasing for the lifetime of a process, while gauges
+may decrease in value.  Aurora uses counters to represent things like the number of times an event
+has occurred, and gauges to capture things like the current length of a queue.  Counters are a
+natural fit for accurate composition into [rate ratios](http://en.wikipedia.org/wiki/Rate_ratio)
+(useful for sample-resistant latency calculation), while gauges are not.
+
+# Alerting
+
+## Quickstart
+If you are looking for just bare-minimum alerting to get something in place quickly, set up alerting
+on `framework_registered` and `task_store_LOST`. These will give you a decent picture of overall
+health.
+
+## A note on thresholds
+One of the most difficult things in monitoring is choosing alert thresholds. With many of these
+stats, there is no value we can offer as a threshold that will be guaranteed to work for you. It
+will depend on the size of your cluster, number of jobs, churn of tasks in the cluster, etc. We
+recommend you start with a strict value after viewing a small amount of collected data, and then
+adjust thresholds as you see fit. Feel free to ask us if you would like to validate that your alerts
+and thresholds make sense.
+
+## Important stats
+
+### `jvm_uptime_secs`
+Type: integer counter
+
+The number of seconds the JVM process has been running. Comes from
+[RuntimeMXBean#getUptime()](http://docs.oracle.com/javase/7/docs/api/java/lang/management/RuntimeMXBean.html#getUptime\(\))
+
+Detecting resets (decreasing values) on this stat will tell you that the scheduler is failing to
+stay alive.
+
+Look at the scheduler logs to identify the reason the scheduler is exiting.
+
+### `system_load_avg`
+Type: double gauge
+
+The current load average of the system for the last minute. Comes from
+[OperatingSystemMXBean#getSystemLoadAverage()](http://docs.oracle.com/javase/7/docs/api/java/lang/management/OperatingSystemMXBean.html?is-external=true#getSystemLoadAverage\(\)).
+
+A high sustained value suggests that the scheduler machine may be over-utilized.
+
+Use standard unix tools like `top` and `ps` to track down the offending process(es).
+
+### `process_cpu_cores_utilized`
+Type: double gauge
+
+The current number of CPU cores in use by the JVM process. This should not exceed the number of
+logical CPU cores on the machine. Derived from
+[OperatingSystemMXBean#getProcessCpuTime()](http://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html)
+
+A high sustained value indicates that the scheduler is overworked. Due to current internal design
+limitations, if this value is sustained at `1`, there is a good chance the scheduler is under water.
+
+There are two main inputs that tend to drive this figure: task scheduling attempts and status
+updates from Mesos.  You may see activity in the scheduler logs to give an indication of where
+time is being spent.  Beyond that, it really takes good familiarity with the code to effectively
+triage this.  We suggest engaging with an Aurora developer.
+
+### `task_store_LOST`
+Type: integer gauge
+
+The number of tasks stored in the scheduler that are in the `LOST` state, and have been rescheduled.
+
+If this value is increasing at a high rate, it is a sign of trouble.
+
+There are many sources of `LOST` tasks in Mesos: the scheduler, master, slave, and executor can all
+trigger this.  The first step is to look in the scheduler logs for `LOST` to identify where the
+state changes are originating.
+
+### `scheduler_resource_offers`
+Type: integer counter
+
+The number of resource offers that the scheduler has received.
+
+For a healthy scheduler, this value must be increasing over time.
+
+Assuming the scheduler is up and otherwise healthy, you will want to check if the master thinks it
+is sending offers. You should also look at the master's web interface to see if it has a large
+number of outstanding offers that it is waiting to be returned.
+
+### `framework_registered`
+Type: binary integer counter
+
+Will be `1` for the leading scheduler that is registered with the Mesos master, `0` for passive
+schedulers,
+
+A sustained period without a `1` (or where `sum() != 1`) warrants investigation.
+
+If there is no leading scheduler, look in the scheduler and master logs for why.  If there are
+multiple schedulers claiming leadership, this suggests a split brain and warrants filing a critical
+bug.
+
+### `rate(scheduler_log_native_append_nanos_total)/rate(scheduler_log_native_append_events)`
+Type: rate ratio of integer counters
+
+This composes two counters to compute a windowed figure for the latency of replicated log writes.
+
+A hike in this value suggests disk bandwidth contention.
+
+Look in scheduler logs for any reported oddness with saving to the replicated log. Also use
+standard tools like `vmstat` and `iotop` to identify whether the disk has become slow or
+over-utilized. We suggest using a dedicated disk for the replicated log to mitigate this.
+
+### `timed_out_tasks`
+Type: integer counter
+
+Tracks the number of times the scheduler has given up while waiting
+(for `-transient_task_state_timeout`) to hear back about a task that is in a transient state
+(e.g. `ASSIGNED`, `KILLING`), and has moved to `LOST` before rescheduling.
+
+This value is currently known to increase occasionally when the scheduler fails over
+([AURORA-740](https://issues.apache.org/jira/browse/AURORA-740)). However, any large spike in this
+value warrants investigation.
+
+The scheduler will log when it times out a task. You should trace the task ID of the timed out
+task into the master, slave, and/or executors to determine where the message was dropped.
+
+### `http_500_responses_events`
+Type: integer counter
+
+The total number of HTTP 500 status responses sent by the scheduler. Includes API and asset serving.
+
+An increase warrants investigation.
+
+Look in scheduler logs to identify why the scheduler returned a 500, there should be a stack trace.

Added: aurora/site/source/documentation/latest/presentations.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/presentations.md?rev=1719755&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/presentations.md (added)
+++ aurora/site/source/documentation/latest/presentations.md Sun Dec 13 00:06:36 2015
@@ -0,0 +1,50 @@
+# Apache Aurora Presentations
+Video and slides from presentations and panel discussions about Apache Aurora.
+
+_(Listed in date descending order)_
+
+<table>
+	<tr>
+		<td><img src="images/presentations/04_30_2015_monolith_to_microservices_thumb.png" alt="From Monolith to Microservices with Aurora Video Thumbnail" /></td>
+		<td><strong><a href="https://www.youtube.com/watch?v=yXkOgnyK4Hw">From Monolith to Microservices w/ Aurora (Video)</a></strong>
+		<p>Presented by Thanos Baskous, Tony Dong, Dobromir Montauk</p>
+		<p>April 30, 2015 at <a href="http://www.meetup.com/Bay-Area-Apache-Aurora-Users-Group/events/221219480/">Bay Area Apache Aurora Users Group</a></p></td>
+	</tr>
+	<tr>
+		<td><img src="images/presentations/02_28_2015_apache_aurora_thumb.png" alt="Apache Auroraの始めかた Slideshow Thumbnail" /></td>
+		<td><strong><a href="http://www.slideshare.net/zembutsu/apache-aurora-introduction-and-tutorial-osc15tk">Apache Auroraの始めかた (Slides)</a></strong>
+		<p>Presented by Masahito Zembutsu</p>
+		<p>February 28, 2015 at <a href="http://www.ospn.jp/osc2015-spring/">Open Source Conference 2015 Tokyo Spring</a></p></td>
+	</tr>
+	<tr>
+		<td><img src="images/presentations/02_19_2015_aurora_adopters_panel_thumb.png" alt="Apache Aurora Adopters Panel Video Thumbnail" /></td>
+		<td><strong><a href="https://www.youtube.com/watch?v=2Jsj0zFdRlg">Apache Aurora Adopters Panel (Video)</a></strong>
+		<p>Panelists Ben Staffin, Josh Adams, Bill Farner, Berk Demir</p>
+		<p>February 19, 2015 at <a href="http://www.meetup.com/Bay-Area-Mesos-User-Group/events/220279080/">Bay Area Mesos Users Group</a></p></td>
+	</tr>
+	<tr>
+		<td><img src="images/presentations/02_19_2015_aurora_at_twitter_thumb.png" alt="Operating Apache Aurora and Mesos at Twitter Video Thumbnail" /></td>
+		<td><strong><a href="https://www.youtube.com/watch?v=E4lxX6epM_U">Operating Apache Aurora and Mesos at Twitter (Video)</a></strong>
+		<p>Presented by Joe Smith</p>
+		<p>February 19, 2015 at <a href="http://www.meetup.com/Bay-Area-Mesos-User-Group/events/220279080/">Bay Area Mesos Users Group</a></p></td>
+	</tr>
+	<tr>
+		<td><img src="images/presentations/02_19_2015_aurora_at_tellapart_thumb.png" alt="Apache Aurora and Mesos at TellApart" /></td>
+		<td><strong><a href="https://www.youtube.com/watch?v=ZZXtXLvTXAE">Apache Aurora and Mesos at TellApart (Video)</a></strong>
+		<p>Presented by Steve Niemitz</p>
+		<p>February 19, 2015 at <a href="http://www.meetup.com/Bay-Area-Mesos-User-Group/events/220279080/">Bay Area Mesos Users Group</a></p></td>
+	</tr>
+	<tr>
+		<td><img src="images/presentations/08_21_2014_past_present_future_thumb.png" alt="Past, Present, and Future of the Aurora Scheduler Video Thumbnail" /></td>
+		<td><strong><a href="https://www.youtube.com/watch?v=Dsc5CPhKs4o">Past, Present, and Future of the Aurora Scheduler (Video)</a></strong>
+		<p>Presented by Bill Farner</p>
+		<p>August 21, 2014 at <a href="http://events.linuxfoundation.org/events/archive/2014/mesoscon">#MesosCon 2014</a></p>
+</td>
+	</tr>
+	<tr>
+		<td><img src="images/presentations/03_25_2014_introduction_to_aurora_thumb.png" alt="Introduction to Apache Aurora Video Thumbnail" /></td>
+		<td><strong><a href="https://www.youtube.com/watch?v=asd_h6VzaJc">Introduction to Apache Aurora (Video)</a></strong>
+		<p>Presented by Bill Farner</p>
+		<p>March 25, 2014 at <a href="https://www.eventbrite.com/e/aurora-and-mesosframeworksmeetup-tickets-10850994617">Aurora and Mesos Frameworks Meetup</a></p></td>
+	</tr>
+</table>
\ No newline at end of file

Added: aurora/site/source/documentation/latest/resources.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/resources.md?rev=1719755&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/resources.md (added)
+++ aurora/site/source/documentation/latest/resources.md Sun Dec 13 00:06:36 2015
@@ -0,0 +1,164 @@
+Resources and Sizing
+=============================
+
+- [Introduction](#introduction)
+- [CPU Isolation](#cpu-isolation)
+- [CPU Sizing](#cpu-sizing)
+- [Memory Isolation](#memory-isolation)
+- [Memory Sizing](#memory-sizing)
+- [Disk Space](#disk-space)
+- [Disk Space Sizing](#disk-space-sizing)
+- [Other Resources](#other-resources)
+- [Resource Quota](#resource-quota)
+- [Task Preemption](#task-preemption)
+
+## Introduction
+
+Aurora is a multi-tenant system; a single software instance runs on a
+server, serving multiple clients/tenants. To share resources among
+tenants, it implements isolation of:
+
+* CPU
+* memory
+* disk space
+
+CPU is a soft limit, and handled differently from memory and disk space.
+Too low a CPU value results in throttling your application and
+slowing it down. Memory and disk space are both hard limits; when your
+application goes over these values, it's killed.
+
+Let's look at each resource type in more detail:
+
+## CPU Isolation
+
+Mesos uses a quota based CPU scheduler (the *Completely Fair Scheduler*)
+to provide consistent and predictable performance.  This is effectively
+a guarantee of resources -- you receive at least what you requested, but
+also no more than you've requested.
+
+The scheduler gives applications a CPU quota for every 100 ms interval.
+When an application uses its quota for an interval, it is throttled for
+the rest of the 100 ms. Usage resets for each interval and unused
+quota does not carry over.
+
+For example, an application specifying 4.0 CPU has access to 400 ms of
+CPU time every 100 ms. This CPU quota can be used in different ways,
+depending on the application and available resources. Consider the
+scenarios shown in this diagram.
+
+![CPU Availability](images/CPUavailability.png)
+
+* *Scenario A*: the application can use up to 4 cores continuously for
+every 100 ms interval. It is never throttled and starts processing
+new requests immediately.
+
+* *Scenario B* : the application uses up to 8 cores (depending on
+availability) but is throttled after 50 ms. The CPU quota resets at the
+start of each new 100 ms interval.
+
+* *Scenario C* : is like Scenario A, but there is a garbage collection
+event in the second interval that consumes all CPU quota. The
+application throttles for the remaining 75 ms of that interval and
+cannot service requests until the next interval. In this example, the
+garbage collection finished in one interval but, depending on how much
+garbage needs collecting, it may take more than one interval and further
+delay service of requests.
+
+*Technical Note*: Mesos considers logical cores, also known as
+hyperthreading or SMT cores, as the unit of CPU.
+
+## CPU Sizing
+
+To correctly size Aurora-run Mesos tasks, specify a per-shard CPU value
+that lets the task run at its desired performance when at peak load
+distributed across all shards. Include reserve capacity of at least 50%,
+possibly more, depending on how critical your service is (or how
+confident you are about your original estimate : -)), ideally by
+increasing the number of shards to also improve resiliency. When running
+your application, observe its CPU stats over time. If consistently at or
+near your quota during peak load, you should consider increasing either
+per-shard CPU or the number of shards.
+
+## Memory Isolation
+
+Mesos uses dedicated memory allocation. Your application always has
+access to the amount of memory specified in your configuration. The
+application's memory use is defined as the sum of the resident set size
+(RSS) of all processes in a shard. Each shard is considered
+independently.
+
+In other words, say you specified a memory size of 10GB. Each shard
+would receive 10GB of memory. If an individual shard's memory demands
+exceed 10GB, that shard is killed, but the other shards continue
+working.
+
+*Technical note*: Total memory size is not enforced at allocation time,
+so your application can request more than its allocation without getting
+an ENOMEM. However, it will be killed shortly after.
+
+## Memory Sizing
+
+Size for your application's peak requirement. Observe the per-instance
+memory statistics over time, as memory requirements can vary over
+different periods. Remember that if your application exceeds its memory
+value, it will be killed, so you should also add a safety margin of
+around 10-20%. If you have the ability to do so, you may also want to
+put alerts on the per-instance memory.
+
+## Disk Space
+
+Disk space used by your application is defined as the sum of the files'
+disk space in your application's directory, including the `stdout` and
+`stderr` logged from your application. Each shard is considered
+independently. You should use off-node storage for your application's
+data whenever possible.
+
+In other words, say you specified disk space size of 100MB. Each shard
+would receive 100MB of disk space. If an individual shard's disk space
+demands exceed 100MB, that shard is killed, but the other shards
+continue working.
+
+After your application finishes running, its allocated disk space is
+reclaimed. Thus, your job's final action should move any disk content
+that you want to keep, such as logs, to your home file system or other
+less transitory storage. Disk reclamation takes place an undefined
+period after the application finish time; until then, the disk contents
+are still available but you shouldn't count on them being so.
+
+*Technical note* : Disk space is not enforced at write so your
+application can write above its quota without getting an ENOSPC, but it
+will be killed shortly after. This is subject to change.
+
+## Disk Space Sizing
+
+Size for your application's peak requirement. Rotate and discard log
+files as needed to stay within your quota. When running a Java process,
+add the maximum size of the Java heap to your disk space requirement, in
+order to account for an out of memory error dumping the heap
+into the application's sandbox space.
+
+## Other Resources
+
+Other resources, such as network bandwidth, do not have any performance
+guarantees. For some resources, such as memory bandwidth, there are no
+practical sharing methods so some application combinations collocated on
+the same host may cause contention.
+
+## Resource Quota
+
+Aurora requires resource quotas for
+[production non-dedicated jobs](/documentation/latest//). Quota is enforced at
+the job role level and when set, defines a non-preemptible pool of compute resources within
+that role.
+
+To grant quota to a particular role in production use `aurora_admin set_quota` command.
+
+NOTE: all job types (service, adhoc or cron) require role resource quota unless a job has
+[dedicated constraint set](/documentation/latest//).
+
+## Task preemption
+
+Under a particular resource shortage pressure, tasks from
+[production](/documentation/latest//) jobs may preempt tasks from any non-production
+job. A production task may only be preempted by tasks from production jobs in the same role with
+higher [priority](/documentation/latest//).
\ No newline at end of file

Added: aurora/site/source/documentation/latest/scheduler-storage.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/scheduler-storage.md?rev=1719755&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/scheduler-storage.md (added)
+++ aurora/site/source/documentation/latest/scheduler-storage.md Sun Dec 13 00:06:36 2015
@@ -0,0 +1,47 @@
+# Snapshot Performance
+
+Periodically the scheduler writes a full snapshot of its state to the replicated log. To do this
+it needs to hold a global storage write lock while it writes out this data. In large clusters
+this has been observed to take up to 40 seconds. Long pauses can cause issues in the system,
+including delays in scheduling new tasks.
+
+The scheduler has two optimizations to reduce the size of snapshots and thus improve snapshot
+performance: compression and deduplication. Most users will want to enable both compression
+and deduplication.
+
+## Compression
+
+To reduce the size of the snapshot the DEFLATE algorithm can be applied to the serialized bytes
+of the snapshot as they are written to the stream. This reduces the total number of bytes that
+need to be written to the replicated log at the cost of CPU and generally reduces the amount
+of time a snapshot takes. Most users will want to enable both compression and deduplication.
+
+### Enabling Compression
+
+Snapshot compression is enabled via the `-deflate_snapshots` flag. This is the default since
+Aurora 0.5.0. All released versions of Aurora can read both compressed and uncompressed snapshots,
+so there are no backwards compatibility concerns associated with changing this flag.
+
+### Disabling compression
+
+Disable compression by passing `-deflate_snapshots=false`.
+
+## Deduplication
+
+In Aurora 0.6.0 a new snapshot format was introduced. Rather than write one configuration blob
+per Mesos task this format stores each configuration blob once, and each Mesos task with a
+pointer to its blob. This format is not backwards compatible with earlier versions of Aurora.
+
+### Enabling Deduplication
+
+After upgrading Aurora to 0.6.0, enable deduplication with the `-deduplicate_snapshots` flag.
+After the first snapshot the cluster will be using the deduplicated format to write to the
+replicated log. Snapshots are created periodically by the scheduler (according to
+the `-dlog_snapshot_interval` flag). An administrator can also force a snapshot operation with
+`aurora_admin snapshot`.
+
+### Disabling Deduplication
+
+To disable deduplication, for example to rollback to Aurora, restart all of the cluster's
+schedulers with `-deduplicate_snapshots=false` and either wait for a snapshot or force one
+using `aurora_admin snapshot`.

Added: aurora/site/source/documentation/latest/security.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/security.md?rev=1719755&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/security.md (added)
+++ aurora/site/source/documentation/latest/security.md Sun Dec 13 00:06:36 2015
@@ -0,0 +1,271 @@
+Aurora integrates with [Apache Shiro](http://shiro.apache.org/) to provide security
+controls for its API. In addition to providing some useful features out of the box, Shiro
+also allows Aurora cluster administrators to adapt the security system to their organization’s
+existing infrastructure.
+
+- [Enabling Security](#enabling-security)
+- [Authentication](#authentication)
+	- [HTTP Basic Authentication](#http-basic-authentication)
+		- [Server Configuration](#server-configuration)
+		- [Client Configuration](#client-configuration)
+	- [HTTP SPNEGO Authentication (Kerberos)](#http-spnego-authentication-kerberos)
+		- [Server Configuration](#server-configuration-1)
+		- [Client Configuration](#client-configuration-1)
+- [Authorization](#authorization)
+	- [Using an INI file to define security controls](#using-an-ini-file-to-define-security-controls)
+		- [Caveats](#caveats)
+- [Implementing a Custom Realm](#implementing-a-custom-realm)
+	- [Packaging a realm module](#packaging-a-realm-module)
+- [Known Issues](#known-issues)
+
+# Enabling Security
+
+There are two major components of security:
+[authentication and authorization](http://en.wikipedia.org/wiki/Authentication#Authorization).  A
+cluster administrator may choose the approach used for each, and may also implement custom
+mechanisms for either.  Later sections describe the options available.
+
+# Authentication
+
+The scheduler must be configured with instructions for how to process authentication
+credentials at a minimum.  There are currently two built-in authentication schemes -
+[HTTP Basic Authentication](http://en.wikipedia.org/wiki/Basic_access_authentication), and
+[SPNEGO](http://en.wikipedia.org/wiki/SPNEGO) (Kerberos).
+
+## HTTP Basic Authentication
+
+Basic Authentication is a very quick way to add *some* security.  It is supported
+by all major browsers and HTTP client libraries with minimal work.  However,
+before relying on Basic Authentication you should be aware of the [security
+considerations](http://tools.ietf.org/html/rfc2617#section-4).
+
+### Server Configuration
+
+At a minimum you need to set 4 command-line flags on the scheduler:
+
+```
+-http_authentication_mechanism=BASIC
+-shiro_realm_modules=INI_AUTHNZ
+-shiro_ini_path=path/to/security.ini
+```
+
+And create a security.ini file like so:
+
+```
+[users]
+sally = apple, admin
+
+[roles]
+admin = *
+```
+
+The details of the security.ini file are explained below. Note that this file contains plaintext,
+unhashed passwords.
+
+### Client Configuration
+
+To configure the client for HTTP Basic authentication, add an entry to ~/.netrc with your credentials
+
+```
+% cat ~/.netrc
+# ...
+
+machine aurora.example.com
+login sally
+password apple
+
+# ...
+```
+
+No changes are required to `clusters.json`.
+
+## HTTP SPNEGO Authentication (Kerberos)
+
+### Server Configuration
+At a minimum you need to set 6 command-line flags on the scheduler:
+
+```
+-http_authentication_mechanism=NEGOTIATE
+-shiro_realm_modules=KERBEROS5_AUTHN,INI_AUTHNZ
+-kerberos_server_principal=HTTP/aurora.example.com@EXAMPLE.COM
+-kerberos_server_keytab=path/to/aurora.example.com.keytab
+-shiro_ini_path=path/to/security.ini
+```
+
+And create a security.ini file like so:
+
+```
+% cat path/to/security.ini
+[users]
+sally = _, admin
+
+[roles]
+admin = *
+```
+
+What's going on here? First, Aurora must be configured to request Kerberos credentials when presented with an
+unauthenticated request. This is achieved by setting
+
+```
+-http_authentication_mechanism=NEGOTIATE
+```
+
+Next, a Realm module must be configured to **authenticate** the current request using the Kerberos
+credentials that were requested. Aurora ships with a realm module that can do this
+
+```
+-shiro_realm_modules=KERBEROS5_AUTHN[,...]
+```
+
+The Kerberos5Realm requires a keytab file and a server principal name. The principal name will usually
+be in the form `HTTP/aurora.example.com@EXAMPLE.COM`.
+
+```
+-kerberos_server_principal=HTTP/aurora.example.com@EXAMPLE.COM
+-kerberos_server_keytab=path/to/aurora.example.com.keytab
+```
+
+The Kerberos5 realm module is authentication-only. For scheduler security to work you must also
+enable a realm module that provides an Authorizer implementation. For example, to do this using the
+IniShiroRealmModule:
+
+```
+-shiro_realm_modules=KERBEROS5_AUTHN,INI_AUTHNZ
+```
+
+You can then configure authorization using a security.ini file as described below
+(the password field is ignored). You must configure the realm module with the path to this file:
+
+```
+-shiro_ini_path=path/to/security.ini
+```
+
+### Client Configuration
+To use Kerberos on the client-side you must build Kerberos-enabled client binaries. Do this with
+
+```
+./pants binary src/main/python/apache/aurora/kerberos:kaurora
+./pants binary src/main/python/apache/aurora/kerberos:kaurora_admin
+```
+
+You must also configure each cluster where you've enabled Kerberos on the scheduler
+to use Kerberos authentication. Do this by setting `auth_mechanism` to `KERBEROS`
+in `clusters.json`.
+
+```
+% cat ~/.aurora/clusters.json
+{
+    "devcluser": {
+        "auth_mechanism": "KERBEROS",
+        ...
+    },
+    ...
+}
+```
+
+# Authorization
+Given a means to authenticate the entity a client claims they are, we need to define what privileges they have.
+
+## Using an INI file to define security controls
+
+The simplest security configuration for Aurora is an INI file on the scheduler.  For small
+clusters, or clusters where the users and access controls change relatively infrequently, this is
+likely the preferred approach.  However you may want to avoid this approach if access permissions
+are rapidly changing, or if your access control information already exists in another system.
+
+You can enable INI-based configuration with following scheduler command line arguments:
+
+```
+-http_authentication_mechanism=BASIC
+-shiro_ini_path=path/to/security.ini
+```
+
+*note* As the argument name reveals, this is using Shiro’s
+[IniRealm](http://shiro.apache.org/configuration.html#Configuration-INIConfiguration) behind
+the scenes.
+
+The INI file will contain two sections - users and roles.  Here’s an example for what might
+be in security.ini:
+
+```
+[users]
+sally = apple, admin
+jim = 123456, accounting
+becky = letmein, webapp
+larry = 654321,accounting
+steve = password
+
+[roles]
+admin = *
+accounting = thrift.AuroraAdmin:setQuota
+webapp = thrift.AuroraSchedulerManager:*:webapp
+```
+
+The users section defines user user credentials and the role(s) they are members of.  These lines
+are of the format `<user> = <password>[, <role>...]`.  As you probably noticed, the passwords are
+in plaintext and as a result read access to this file should be restricted.
+
+In this configuration, each user has different privileges for actions in the cluster because
+of the roles they are a part of:
+
+* admin is granted all privileges
+* accounting may adjust the amount of resource quota for any role
+* webapp represents a collection of jobs that represents a service, and its members may create and modify any jobs owned by it
+
+### Caveats
+You might find documentation on the Internet suggesting there are additional sections in `shiro.ini`,
+like `[main]` and `[urls]`. These are not supported by Aurora as it uses a different mechanism to configure
+those parts of Shiro. Think of Aurora's `security.ini` as a subset with only `[users]` and `[roles]` sections.
+
+# Implementing a Custom Realm
+
+Since Aurora’s security is backed by [Apache Shiro](https://shiro.apache.org), you can implement a
+custom [Realm](http://shiro.apache.org/realm.html) to define organization-specific security behavior.
+
+In addition to using Shiro's standard APIs to implement a Realm you can link against Aurora to
+access the type-safe Permissions Aurora uses. See the Javadoc for `org.apache.aurora.scheduler.spi`
+for more information.
+
+## Packaging a realm module
+Package your custom Realm(s) with a Guice module that exposes a `Set<Realm>` multibinding.
+
+```java
+package com.example;
+
+import com.google.inject.AbstractModule;
+import com.google.inject.multibindings.Multibinder;
+import org.apache.shiro.realm.Realm;
+
+public class MyRealmModule extends AbstractModule {
+  @Override
+  public void configure() {
+    Realm myRealm = new MyRealm();
+
+    Multibinder.newSetBinder(binder(), Realm.class).addBinding().toInstance(myRealm);
+  }
+
+  static class MyRealm implements Realm {
+    // Realm implementation.
+  }
+}
+```
+
+To use your module in the scheduler, include it as a realm module based on its fully-qualified
+class name:
+
+```
+-shiro_realm_modules=KERBEROS5_AUTHN,INI_AUTHNZ,com.example.MyRealmModule
+```
+
+# Known Issues
+
+While the APIs and SPIs we ship with are stable as of 0.8.0, we are aware of several incremental
+improvements. Please follow, vote, or send patches.
+
+Relevant tickets:
+* [AURORA-343](https://issues.apache.org/jira/browse/AURORA-343): HTTPS support
+* [AURORA-1248](https://issues.apache.org/jira/browse/AURORA-1248): Client retries 4xx errors
+* [AURORA-1279](https://issues.apache.org/jira/browse/AURORA-1279): Remove kerberos-specific build targets
+* [AURORA-1293](https://issues.apache.org/jira/browse/AURORA-1291): Consider defining a JSON format in place of INI
+* [AURORA-1179](https://issues.apache.org/jira/browse/AURORA-1179): Supported hashed passwords in security.ini
+* [AURORA-1295](https://issues.apache.org/jira/browse/AURORA-1295): Support security for the ReadOnlyScheduler service

Added: aurora/site/source/documentation/latest/sla.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/sla.md?rev=1719755&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/sla.md (added)
+++ aurora/site/source/documentation/latest/sla.md Sun Dec 13 00:06:36 2015
@@ -0,0 +1,177 @@
+Aurora SLA Measurement
+--------------
+
+- [Overview](#overview)
+- [Metric Details](#metric-details)
+  - [Platform Uptime](#platform-uptime)
+  - [Job Uptime](#job-uptime)
+  - [Median Time To Assigned (MTTA)](#median-time-to-assigned-\(mtta\))
+  - [Median Time To Running (MTTR)](#median-time-to-running-\(mttr\))
+- [Limitations](#limitations)
+
+## Overview
+
+The primary goal of the feature is collection and monitoring of Aurora job SLA (Service Level
+Agreements) metrics that defining a contractual relationship between the Aurora/Mesos platform
+and hosted services.
+
+The Aurora SLA feature is by default only enabled for service (non-cron)
+production jobs (`"production = True"` in your `.aurora` config). It can be enabled for
+non-production services via the scheduler command line flag `-sla_non_prod_metrics`.
+
+Counters that track SLA measurements are computed periodically within the scheduler.
+The individual instance metrics are refreshed every minute (configurable via
+`sla_stat_refresh_interval`). The instance counters are subsequently aggregated by
+relevant grouping types before exporting to scheduler `/vars` endpoint (when using `vagrant`
+that would be `http://192.168.33.7:8081/vars`)
+
+## Metric Details
+
+### Platform Uptime
+
+*Aggregate amount of time a job spends in a non-runnable state due to platform unavailability
+or scheduling delays. This metric tracks Aurora/Mesos uptime performance and reflects on any
+system-caused downtime events (tasks LOST or DRAINED). Any user-initiated task kills/restarts
+will not degrade this metric.*
+
+**Collection scope:**
+
+* Per job - `sla_<job_key>_platform_uptime_percent`
+* Per cluster - `sla_cluster_platform_uptime_percent`
+
+**Units:** percent
+
+A fault in the task environment may cause the Aurora/Mesos to have different views on the task state
+or lose track of the task existence. In such cases, the service task is marked as LOST and
+rescheduled by Aurora. For example, this may happen when the task stays in ASSIGNED or STARTING
+for too long or the Mesos slave becomes unhealthy (or disappears completely). The time between
+task entering LOST and its replacement reaching RUNNING state is counted towards platform downtime.
+
+Another example of a platform downtime event is the administrator-requested task rescheduling. This
+happens during planned Mesos slave maintenance when all slave tasks are marked as DRAINED and
+rescheduled elsewhere.
+
+To accurately calculate Platform Uptime, we must separate platform incurred downtime from user
+actions that put a service instance in a non-operational state. It is simpler to isolate
+user-incurred downtime and treat all other downtime as platform incurred.
+
+Currently, a user can cause a healthy service (task) downtime in only two ways: via `killTasks`
+or `restartShards` RPCs. For both, their affected tasks leave an audit state transition trail
+relevant to uptime calculations. By applying a special "SLA meaning" to exposed task state
+transition records, we can build a deterministic downtime trace for every given service instance.
+
+A task going through a state transition carries one of three possible SLA meanings
+(see [SlaAlgorithm.java](https://github.com/apache/aurora/blob/#{git_tag}/src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java)) for
+sla-to-task-state mapping):
+
+* Task is UP: starts a period where the task is considered to be up and running from the Aurora
+  platform standpoint.
+
+* Task is DOWN: starts a period where the task cannot reach the UP state for some
+  non-user-related reason. Counts towards instance downtime.
+
+* Task is REMOVED from SLA: starts a period where the task is not expected to be UP due to
+  user initiated action or failure. We ignore this period for the uptime calculation purposes.
+
+This metric is recalculated over the last sampling period (last minute) to account for
+any UP/DOWN/REMOVED events. It ignores any UP/DOWN events not immediately adjacent to the
+sampling interval as well as adjacent REMOVED events.
+
+### Job Uptime
+
+*Percentage of the job instances considered to be in RUNNING state for the specified duration
+relative to request time. This is a purely application side metric that is considering aggregate
+uptime of all RUNNING instances. Any user- or platform initiated restarts directly affect
+this metric.*
+
+**Collection scope:** We currently expose job uptime values at 5 pre-defined
+percentiles (50th,75th,90th,95th and 99th):
+
+* `sla_<job_key>_job_uptime_50_00_sec`
+* `sla_<job_key>_job_uptime_75_00_sec`
+* `sla_<job_key>_job_uptime_90_00_sec`
+* `sla_<job_key>_job_uptime_95_00_sec`
+* `sla_<job_key>_job_uptime_99_00_sec`
+
+**Units:** seconds
+You can also get customized real-time stats from aurora client. See `aurora sla -h` for
+more details.
+
+### Median Time To Assigned (MTTA)
+
+*Median time a job spends waiting for its tasks to be assigned to a host. This is a combined
+metric that helps track the dependency of scheduling performance on the requested resources
+(user scope) as well as the internal scheduler bin-packing algorithm efficiency (platform scope).*
+
+**Collection scope:**
+
+* Per job - `sla_<job_key>_mtta_ms`
+* Per cluster - `sla_cluster_mtta_ms`
+* Per instance size (small, medium, large, x-large, xx-large). Size are defined in:
+[ResourceAggregates.java](https://github.com/apache/aurora/blob/#{git_tag}/src/main/java/org/apache/aurora/scheduler/base/ResourceAggregates.java))
+  * By CPU:
+    * `sla_cpu_small_mtta_ms`
+    * `sla_cpu_medium_mtta_ms`
+    * `sla_cpu_large_mtta_ms`
+    * `sla_cpu_xlarge_mtta_ms`
+    * `sla_cpu_xxlarge_mtta_ms`
+  * By RAM:
+    * `sla_ram_small_mtta_ms`
+    * `sla_ram_medium_mtta_ms`
+    * `sla_ram_large_mtta_ms`
+    * `sla_ram_xlarge_mtta_ms`
+    * `sla_ram_xxlarge_mtta_ms`
+  * By DISK:
+    * `sla_disk_small_mtta_ms`
+    * `sla_disk_medium_mtta_ms`
+    * `sla_disk_large_mtta_ms`
+    * `sla_disk_xlarge_mtta_ms`
+    * `sla_disk_xxlarge_mtta_ms`
+
+**Units:** milliseconds
+
+MTTA only considers instances that have already reached ASSIGNED state and ignores those
+that are still PENDING. This ensures straggler instances (e.g. with unreasonable resource
+constraints) do not affect metric curves.
+
+### Median Time To Running (MTTR)
+
+*Median time a job waits for its tasks to reach RUNNING state. This is a comprehensive metric
+reflecting on the overall time it takes for the Aurora/Mesos to start executing user content.*
+
+**Collection scope:**
+
+* Per job - `sla_<job_key>_mttr_ms`
+* Per cluster - `sla_cluster_mttr_ms`
+* Per instance size (small, medium, large, x-large, xx-large). Size are defined in:
+[ResourceAggregates.java](https://github.com/apache/aurora/blob/#{git_tag}/src/main/java/org/apache/aurora/scheduler/base/ResourceAggregates.java))
+  * By CPU:
+    * `sla_cpu_small_mttr_ms`
+    * `sla_cpu_medium_mttr_ms`
+    * `sla_cpu_large_mttr_ms`
+    * `sla_cpu_xlarge_mttr_ms`
+    * `sla_cpu_xxlarge_mttr_ms`
+  * By RAM:
+    * `sla_ram_small_mttr_ms`
+    * `sla_ram_medium_mttr_ms`
+    * `sla_ram_large_mttr_ms`
+    * `sla_ram_xlarge_mttr_ms`
+    * `sla_ram_xxlarge_mttr_ms`
+  * By DISK:
+    * `sla_disk_small_mttr_ms`
+    * `sla_disk_medium_mttr_ms`
+    * `sla_disk_large_mttr_ms`
+    * `sla_disk_xlarge_mttr_ms`
+    * `sla_disk_xxlarge_mttr_ms`
+
+**Units:** milliseconds
+
+MTTR only considers instances in RUNNING state. This ensures straggler instances (e.g. with
+unreasonable resource constraints) do not affect metric curves.
+
+## Limitations
+
+* The availability of Aurora SLA metrics is bound by the scheduler availability.
+
+* All metrics are calculated at a pre-defined interval (currently set at 1 minute).
+  Scheduler restarts may result in missed collections.

Added: aurora/site/source/documentation/latest/storage-config.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/storage-config.md?rev=1719755&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/storage-config.md (added)
+++ aurora/site/source/documentation/latest/storage-config.md Sun Dec 13 00:06:36 2015
@@ -0,0 +1,142 @@
+# Storage Configuration And Maintenance
+
+- [Overview](#overview)
+- [Scheduler storage configuration flags](#scheduler-storage-configuration-flags)
+  - [Mesos replicated log configuration flags](#mesos-replicated-log-configuration-flags)
+    - [-native_log_quorum_size](#-native_log_quorum_size)
+    - [-native_log_file_path](#-native_log_file_path)
+    - [-native_log_zk_group_path](#-native_log_zk_group_path)
+  - [Backup configuration flags](#backup-configuration-flags)
+    - [-backup_interval](#-backup_interval)
+    - [-backup_dir](#-backup_dir)
+    - [-max_saved_backups](#-max_saved_backups)
+- [Recovering from a scheduler backup](#recovering-from-a-scheduler-backup)
+  - [Summary](#summary)
+  - [Preparation](#preparation)
+  - [Cleanup and re-initialize Mesos replicated log](#cleanup-and-re-initialize-mesos-replicated-log)
+  - [Restore from backup](#restore-from-backup)
+  - [Cleanup](#cleanup)
+
+## Overview
+
+This document summarizes Aurora storage configuration and maintenance details and is
+intended for use by anyone deploying and/or maintaining Aurora.
+
+For a high level overview of the Aurora storage architecture refer to [this document](/documentation/latest//).
+
+## Scheduler storage configuration flags
+
+Below is a summary of scheduler storage configuration flags that either don't have default values
+or require attention before deploying in a production environment.
+
+### Mesos replicated log configuration flags
+
+#### -native_log_quorum_size
+Defines the Mesos replicated log quorum size. See
+[the replicated log configuration document](/documentation/latest//)
+on how to choose the right value.
+
+#### -native_log_file_path
+Location of the Mesos replicated log files. Consider allocating a dedicated disk (preferably SSD)
+for Mesos replicated log files to ensure optimal storage performance.
+
+#### -native_log_zk_group_path
+ZooKeeper path used for Mesos replicated log quorum discovery.
+
+See [code](https://github.com/apache/aurora/blob/#{git_tag}/src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java)) for
+other available Mesos replicated log configuration options and default values.
+
+### Backup configuration flags
+
+Configuration options for the Aurora scheduler backup manager.
+
+#### -backup_interval
+The interval on which the scheduler writes local storage backups.  The default is every hour.
+
+#### -backup_dir
+Directory to write backups to.
+
+#### -max_saved_backups
+Maximum number of backups to retain before deleting the oldest backup(s).
+
+## Recovering from a scheduler backup
+
+- [Overview](#overview)
+- [Preparation](#preparation)
+- [Assess Mesos replicated log damage](#assess-mesos-replicated-log-damage)
+- [Restore from backup](#restore-from-backup)
+- [Cleanup](#cleanup)
+
+**Be sure to read the entire page before attempting to restore from a backup, as it may have
+unintended consequences.**
+
+### Summary
+
+The restoration procedure replaces the existing (possibly corrupted) Mesos replicated log with an
+earlier, backed up, version and requires all schedulers to be taken down temporarily while
+restoring. Once completed, the scheduler state resets to what it was when the backup was created.
+This means any jobs/tasks created or updated after the backup are unknown to the scheduler and will
+be killed shortly after the cluster restarts. All other tasks continue operating as normal.
+
+Usually, it is a bad idea to restore a backup that is not extremely recent (i.e. older than a few
+hours). This is because the scheduler will expect the cluster to look exactly as the backup does,
+so any tasks that have been rescheduled since the backup was taken will be killed.
+
+### Preparation
+
+Follow these steps to prepare the cluster for restoring from a backup:
+
+* Stop all scheduler instances
+
+* Consider blocking external traffic on a port defined in `-http_port` for all schedulers to
+prevent users from interacting with the scheduler during the restoration process. This will help
+troubleshooting by reducing the scheduler log noise and prevent users from making changes that will
+be erased after the backup snapshot is restored
+
+* Next steps are required to put scheduler into a partially disabled state where it would still be
+able to accept storage recovery requests but unable to schedule or change task states. This may be
+accomplished by updating the following scheduler configuration options:
+  * Set `-mesos_master_address` to a non-existent zk address. This will prevent scheduler from
+    registering with Mesos. E.g.: `-mesos_master_address=zk://localhost:2181`
+  * `-max_registration_delay` - set to sufficiently long interval to prevent registration timeout
+    and as a result scheduler suicide. E.g: `-max_registration_delay=360mins`
+  * Make sure `-reconciliation_initial_delay` option is set high enough (e.g.: `365days`) to
+    prevent accidental task GC. This is important as scheduler will attempt to reconcile the cluster
+    state and will kill all tasks when restarted with an empty Mesos replicated log.
+
+* Restart all schedulers
+
+### Cleanup and re-initialize Mesos replicated log
+
+Get rid of the corrupted files and re-initialize Mesos replicate log:
+
+* Stop schedulers
+* Delete all files under `-native_log_file_path` on all schedulers
+* Initialize Mesos replica's log file: `mesos-log initialize --path=<-native_log_file_path>`
+* Restart schedulers
+
+### Restore from backup
+
+At this point the scheduler is ready to rehydrate from the backup:
+
+* Identify the leading scheduler by:
+  * running `aurora_admin get_scheduler <cluster>` - if scheduler is responsive
+  * examining scheduler logs
+  * or examining Zookeeper registration under the path defined by `-zk_endpoints`
+    and `-serverset_path`
+
+* Locate the desired backup file, copy it to the leading scheduler and stage recovery by running
+the following command on a leader
+`aurora_admin scheduler_stage_recovery <cluster> scheduler-backup-<yyyy-MM-dd-HH-mm>`
+
+* At this point, the recovery snapshot is staged and available for manual verification/modification
+via `aurora_admin scheduler_print_recovery_tasks` and `scheduler_delete_recovery_tasks` commands.
+See `aurora_admin help <command>` for usage details.
+
+* Commit recovery. This instructs the scheduler to overwrite the existing Mesosreplicated log with
+the provided backup snapshot and initiate a mandatory failover
+`aurora_admin scheduler_commit_recovery <cluster>`
+
+### Cleanup
+Undo any modification done during [Preparation](#preparation) sequence.
+

Added: aurora/site/source/documentation/latest/storage.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/storage.md?rev=1719755&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/storage.md (added)
+++ aurora/site/source/documentation/latest/storage.md Sun Dec 13 00:06:36 2015
@@ -0,0 +1,88 @@
+#Aurora Scheduler Storage
+
+- [Overview](#overview)
+- [Reads, writes, modifications](#reads-writes-modifications)
+  - [Read lifecycle](#read-lifecycle)
+  - [Write lifecycle](#write-lifecycle)
+- [Atomicity, consistency and isolation](#atomicity-consistency-and-isolation)
+- [Population on restart](#population-on-restart)
+
+## Overview
+
+Aurora scheduler maintains data that need to be persisted to survive failovers and restarts.
+For example:
+
+* Task configurations and scheduled task instances
+* Job update configurations and update progress
+* Production resource quotas
+* Mesos resource offer host attributes
+
+Aurora solves its persistence needs by leveraging the Mesos implementation of a Paxos replicated
+log [[1]](https://ramcloud.stanford.edu/~ongaro/userstudy/paxos.pdf)
+[[2]](http://en.wikipedia.org/wiki/State_machine_replication) with a key-value
+[LevelDB](https://github.com/google/leveldb) storage as persistence media.
+
+Conceptually, it can be represented by the following major components:
+
+* Volatile storage: in-memory cache of all available data. Implemented via in-memory
+[H2 Database](http://www.h2database.com/html/main.html) and accessed via
+[MyBatis](http://mybatis.github.io/mybatis-3/).
+* Log manager: interface between Aurora storage and Mesos replicated log. The default schema format
+is [thrift](https://github.com/apache/thrift). Data is stored in serialized binary form.
+* Snapshot manager: all data is periodically persisted in Mesos replicated log in a single snapshot.
+This helps establishing periodic recovery checkpoints and speeds up volatile storage recovery on
+restart.
+* Backup manager: as a precaution, snapshots are periodically written out into backup files.
+This solves a [disaster recovery problem](/documentation/latest//)
+in case of a complete loss or corruption of Mesos log files.
+
+![Storage hierarchy](images/storage_hierarchy.png)
+
+## Reads, writes, modifications
+
+All services in Aurora access data via a set of predefined store interfaces (aka stores) logically
+grouped by the type of data they serve. Every interface defines a specific set of operations allowed
+on the data thus abstracting out the storage access and the actual persistence implementation. The
+latter is especially important in view of a general immutability of persisted data. With the Mesos
+replicated log as the underlying persistence solution, data can be read and written easily but not
+modified. All modifications are simulated by saving new versions of modified objects. This feature
+and general performance considerations justify the existence of the volatile in-memory store.
+
+### Read lifecycle
+
+There are two types of reads available in Aurora: consistent and weakly-consistent. The difference
+is explained [below](#atomicity-and-isolation).
+
+All reads are served from the volatile storage making reads generally cheap storage operations
+from the performance standpoint. The majority of the volatile stores are represented by the
+in-memory H2 database. This allows for rich schema definitions, queries and relationships that
+key-value storage is unable to match.
+
+### Write lifecycle
+
+Writes are more involved operations since in addition to updating the volatile store data has to be
+appended to the replicated log. Data is not available for reads until fully ack-ed by both
+replicated log and volatile storage.
+
+## Atomicity, consistency and isolation
+
+Aurora uses [write-ahead logging](http://en.wikipedia.org/wiki/Write-ahead_logging) to ensure
+consistency between replicated and volatile storage. In Aurora, data is first written into the
+replicated log and only then updated in the volatile store.
+
+Aurora storage uses read-write locks to serialize data mutations and provide consistent view of the
+available data. The available `Storage` interface exposes 3 major types of operations:
+* `consistentRead` - access is locked using reader's lock and provides consistent view on read
+* `weaklyConsistentRead` - access is lock-less. Delivers best contention performance but may result
+in stale reads
+* `write` - access is fully serialized by using writer's lock. Operation success requires both
+volatile and replicated writes to succeed.
+
+The consistency of the volatile store is enforced via H2 transactional isolation.
+
+## Population on restart
+
+Any time a scheduler restarts, it restores its volatile state from the most recent position recorded
+in the replicated log by restoring the snapshot and replaying individual log entries on top to fully
+recover the state up to the last write.
+

Added: aurora/site/source/documentation/latest/test-resource-generation.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/test-resource-generation.md?rev=1719755&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/test-resource-generation.md (added)
+++ aurora/site/source/documentation/latest/test-resource-generation.md Sun Dec 13 00:06:36 2015
@@ -0,0 +1,24 @@
+# Generating test resources
+
+## Background
+The Aurora source repository and distributions contain several
+[binary files](https://github.com/apache/aurora/blob/#{git_tag}/src/test/resources/org/apache/thermos/root/checkpoints)) to
+qualify the backwards-compatibility of thermos with checkpoint data. Since
+thermos persists state to disk, to be read by the thermos observer), it is important that we have
+tests that prevent regressions affecting the ability to parse previously-written data.
+
+## Generating test files
+The files included represent persisted checkpoints that exercise different
+features of thermos. The existing files should not be modified unless
+we are accepting backwards incompatibility, such as with a major release.
+
+It is not practical to write source code to generate these files on the fly,
+as source would be vulnerable to drift (e.g. due to refactoring) in ways
+that would undermine the goal of ensuring backwards compatibility.
+
+The most common reason to add a new checkpoint file would be to provide
+coverage for new thermos features that alter the data format. This is
+accomplished by writing and running a
+[job configuration](/documentation/latest//) that exercises the feature, and
+copying the checkpoint file from the sandbox directory, by default this is
+`/var/run/thermos/checkpoints/<aurora task id>`.

Added: aurora/site/source/documentation/latest/thrift-deprecation.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/thrift-deprecation.md?rev=1719755&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/thrift-deprecation.md (added)
+++ aurora/site/source/documentation/latest/thrift-deprecation.md Sun Dec 13 00:06:36 2015
@@ -0,0 +1,50 @@
+# Thrift API Changes
+
+## Overview
+Aurora uses [Apache Thrift](https://thrift.apache.org/) for representing structured data in
+client/server RPC protocol as well as for internal data storage. While Thrift is capable of
+correctly handling additions and renames of the existing members, field removals must be done
+carefully to ensure backwards compatibility and provide predictable deprecation cycle. This
+document describes general guidelines for making Thrift schema changes to the existing fields in
+[api.thrift](https://github.com/apache/aurora/blob/#{git_tag}/api/src/main/thrift/org/apache/aurora/gen/api.thrift)).
+
+It is highly recommended to go through the
+[Thrift: The Missing Guide](http://diwakergupta.github.io/thrift-missing-guide/) first to refresh on
+basic Thrift schema concepts.
+
+## Checklist
+Every existing Thrift schema modification is unique in its requirements and must be analyzed
+carefully to identify its scope and expected consequences. The following checklist may help in that
+analysis:
+* Is this a new field/struct? If yes, go ahead
+* Is this a pure field/struct rename without any type/structure change? If yes, go ahead and rename
+* Anything else, read further to make sure your change is properly planned
+
+## Deprecation cycle
+Any time a breaking change (e.g.: field replacement or removal) is required, the following cycle
+must be followed:
+
+### vCurrent
+Change is applied in a way that does not break scheduler/client with this version to
+communicate with scheduler/client from vCurrent-1.
+* Do not remove or rename the old field
+* Add a new field as an eventual replacement of the old one and implement a dual read/write
+anywhere the old field is used
+* Check [storage.thrift](https://github.com/apache/aurora/blob/#{git_tag}/api/src/main/thrift/org/apache/aurora/gen/storage.thrift)) to see if the
+affected struct is stored in Aurora scheduler storage. If so, you most likely need to backfill
+existing data to ensure both fields are populated eagerly on startup
+See [StorageBackfill.java](https://github.com/apache/aurora/blob/#{git_tag}/src/main/java/org/apache/aurora/scheduler/storage/StorageBackfill.java))
+* Add a deprecation jira ticket into the vCurrent+1 release candidate
+* Add a TODO for the deprecated field mentioning the jira ticket
+
+### vCurrent+1
+Finalize the change by removing the deprecated fields from the Thrift schema.
+* Drop any dual read/write routines added in the previous version
+* Remove the deprecated Thrift field
+
+## Testing
+It's always advisable to test your changes in the local vagrant environment to build more
+confidence that you change is backwards compatible. It's easy to simulate different
+client/scheduler versions by playing with `aurorabuild` command. See [this document](/documentation/latest//)
+for more.
+

Added: aurora/site/source/documentation/latest/tutorial.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/tutorial.md?rev=1719755&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/tutorial.md (added)
+++ aurora/site/source/documentation/latest/tutorial.md Sun Dec 13 00:06:36 2015
@@ -0,0 +1,265 @@
+Aurora Tutorial
+---------------
+
+- [Introduction](#introduction)
+- [Setup: Install Aurora](#setup-install-aurora)
+- [The Script](#the-script)
+- [Aurora Configuration](#aurora-configuration)
+- [What's Going On In That Configuration File?](#whats-going-on-in-that-configuration-file)
+- [Creating the Job](#creating-the-job)
+- [Watching the Job Run](#watching-the-job-run)
+- [Cleanup](#cleanup)
+- [Next Steps](#next-steps)
+
+## Introduction
+
+This tutorial shows how to use the Aurora scheduler to run (and
+"`printf-debug`") a hello world program on Mesos. The operational
+hierarchy is:
+
+- Aurora manages and schedules jobs for Mesos to run.
+- Mesos manages the individual tasks that make up a job.
+- Thermos manages the individual processes that make up a task.
+
+This is the recommended first Aurora users document to read to start
+getting up to speed on the system.
+
+To get help, email questions to the Aurora Developer List,
+[dev@aurora.apache.org](mailto:dev@aurora.apache.org)
+
+## Setup: Install Aurora
+
+You use the Aurora client and web UI to interact with Aurora jobs. To
+install it locally, see [vagrant.md](/documentation/latest//). The remainder of this
+Tutorial assumes you are running Aurora using Vagrant.  Unless otherwise stated,
+all commands are to be run from the root of the aurora repository clone.
+
+## The Script
+
+Our "hello world" application is a simple Python script that loops
+forever, displaying the time every few seconds. Copy the code below and
+put it in a file named `hello_world.py` in the root of your Aurora repository clone (Note:
+this directory is the same as `/vagrant` inside the Vagrant VMs).
+
+The script has an intentional bug, which we will explain later on.
+
+<!-- NOTE: If you are changing this file, be sure to also update examples/vagrant/test_tutorial.sh.
+-->
+```python
+import sys
+import time
+
+def main(argv):
+  SLEEP_DELAY = 10
+  # Python ninjas - ignore this blatant bug.
+  for i in xrang(100):
+    print("Hello world! The time is now: %s. Sleeping for %d secs" % (
+      time.asctime(), SLEEP_DELAY))
+    sys.stdout.flush()
+    time.sleep(SLEEP_DELAY)
+
+if __name__ == "__main__":
+  main(sys.argv)
+```
+
+## Aurora Configuration
+
+Once we have our script/program, we need to create a *configuration
+file* that tells Aurora how to manage and launch our Job. Save the below
+code in the file `hello_world.aurora`.
+
+<!-- NOTE: If you are changing this file, be sure to also update examples/vagrant/test_tutorial.sh.
+-->
+```python
+pkg_path = '/vagrant/hello_world.py'
+
+# we use a trick here to make the configuration change with
+# the contents of the file, for simplicity.  in a normal setting, packages would be
+# versioned, and the version number would be changed in the configuration.
+import hashlib
+with open(pkg_path, 'rb') as f:
+  pkg_checksum = hashlib.md5(f.read()).hexdigest()
+
+# copy hello_world.py into the local sandbox
+install = Process(
+  name = 'fetch_package',
+  cmdline = 'cp %s . && echo %s && chmod +x hello_world.py' % (pkg_path, pkg_checksum))
+
+# run the script
+hello_world = Process(
+  name = 'hello_world',
+  cmdline = 'python hello_world.py')
+
+# describe the task
+hello_world_task = SequentialTask(
+  processes = [install, hello_world],
+  resources = Resources(cpu = 1, ram = 1*MB, disk=8*MB))
+
+jobs = [
+  Service(cluster = 'devcluster',
+          environment = 'devel',
+          role = 'www-data',
+          name = 'hello_world',
+          task = hello_world_task)
+]
+```
+
+For more about Aurora configuration files, see the [Configuration
+Tutorial](/documentation/latest//) and the [Aurora + Thermos
+Reference](/documentation/latest//) (preferably after finishing this
+tutorial).
+
+## What's Going On In That Configuration File?
+
+More than you might think.
+
+1. From a "big picture" viewpoint, it first defines two
+Processes. Then it defines a Task that runs the two Processes in the
+order specified in the Task definition, as well as specifying what
+computational and memory resources are available for them.  Finally,
+it defines a Job that will schedule the Task on available and suitable
+machines. This Job is the sole member of a list of Jobs; you can
+specify more than one Job in a config file.
+
+2. At the Process level, it specifies how to get your code into the
+local sandbox in which it will run. It then specifies how the code is
+actually run once the second Process starts.
+
+## Creating the Job
+
+We're ready to launch our job! To do so, we use the Aurora Client to
+issue a Job creation request to the Aurora scheduler.
+
+Many Aurora Client commands take a *job key* argument, which uniquely
+identifies a Job. A job key consists of four parts, each separated by a
+"/". The four parts are  `<cluster>/<role>/<environment>/<jobname>`
+in that order. When comparing two job keys, if any of the
+four parts is different from its counterpart in the other key, then the
+two job keys identify two separate jobs. If all four values are
+identical, the job keys identify the same job.
+
+`/etc/aurora/clusters.json` within the Aurora scheduler has the available
+cluster names. For Vagrant, from the top-level of your Aurora repository clone,
+do:
+
+    $ vagrant ssh
+
+Followed by:
+
+    vagrant@precise64:~$ cat /etc/aurora/clusters.json
+
+You'll see something like:
+
+```javascript
+[{
+  "name": "devcluster",
+  "zk": "192.168.33.7",
+  "scheduler_zk_path": "/aurora/scheduler",
+  "auth_mechanism": "UNAUTHENTICATED"
+}]
+```
+
+Use a `name` value for your job key's cluster value.
+
+Role names are user accounts existing on the slave machines. If you don't know what accounts
+are available, contact your sysadmin.
+
+Environment names are namespaces; you can count on `prod`, `devel` and `test` existing.
+
+The Aurora Client command that actually runs our Job is `aurora job create`. It creates a Job as
+specified by its job key and configuration file arguments and runs it.
+
+    aurora job create <cluster>/<role>/<environment>/<jobname> <config_file>
+
+Or for our example:
+
+    aurora job create devcluster/www-data/devel/hello_world /vagrant/hello_world.aurora
+
+This returns:
+
+    $ vagrant ssh
+    Welcome to Ubuntu 12.04 LTS (GNU/Linux 3.2.0-23-generic x86_64)
+
+     * Documentation:  https://help.ubuntu.com/
+    Welcome to your Vagrant-built virtual machine.
+    Last login: Fri Jan  3 02:18:55 2014 from 10.0.2.2
+    vagrant@precise64:~$ aurora job create devcluster/www-data/devel/hello_world \
+        /vagrant/hello_world.aurora
+     INFO] Creating job hello_world
+     INFO] Response from scheduler: OK (message: 1 new tasks pending for job
+      www-data/devel/hello_world)
+     INFO] Job url: http://precise64:8081/scheduler/www-data/devel/hello_world
+
+## Watching the Job Run
+
+Now that our job is running, let's see what it's doing. Access the
+scheduler web interface at `http://$scheduler_hostname:$scheduler_port/scheduler`
+Or when using `vagrant`, `http://192.168.33.7:8081/scheduler`
+First we see what Jobs are scheduled:
+
+![Scheduled Jobs](images/ScheduledJobs.png)
+
+Click on your user name, which in this case was `www-data`, and we see the Jobs associated
+with that role:
+
+![Role Jobs](images/RoleJobs.png)
+
+If you click on your `hello_world` Job, you'll see:
+
+![hello_world Job](images/HelloWorldJob.png)
+
+Oops, looks like our first job didn't quite work! The task failed, so we have
+to figure out what went wrong.
+
+Access the page for our Task by clicking on its host.
+
+![Task page](images/TaskBreakdown.png)
+
+Once there, we see that the
+`hello_world` process failed. The Task page captures the standard error and
+standard output streams and makes them available. Clicking through
+to `stderr` on the failed `hello_world` process, we see what happened.
+
+![stderr page](images/stderr.png)
+
+It looks like we made a typo in our Python script. We wanted `xrange`,
+not `xrang`. Edit the `hello_world.py` script to use the correct function and
+we will try again.
+
+    aurora job update devcluster/www-data/devel/hello_world /vagrant/hello_world.aurora
+
+This time, the task comes up, we inspect the page, and see that the
+`hello_world` process is running.
+
+![Running Task page](images/runningtask.png)
+
+We then inspect the output by clicking on `stdout` and see our process'
+output:
+
+![stdout page](images/stdout.png)
+
+## Cleanup
+
+Now that we're done, we kill the job using the Aurora client:
+
+    vagrant@precise64:~$ aurora job killall devcluster/www-data/devel/hello_world
+     INFO] Killing tasks for job: devcluster/www-data/devel/hello_world
+     INFO] Response from scheduler: OK (message: Tasks killed.)
+     INFO] Job url: http://precise64:8081/scheduler/www-data/devel/hello_world
+    vagrant@precise64:~$
+
+The job page now shows the `hello_world` tasks as completed.
+
+![Killed Task page](images/killedtask.png)
+
+## Next Steps
+
+Now that you've finished this Tutorial, you should read or do the following:
+
+- [The Aurora Configuration Tutorial](/documentation/latest//), which provides more examples
+  and best practices for writing Aurora configurations. You should also look at
+  the [Aurora + Thermos Configuration Reference](/documentation/latest//).
+- The [Aurora User Guide](/documentation/latest//) provides an overview of how Aurora, Mesos, and
+  Thermos work "under the hood".
+- Explore the Aurora Client - use `aurora -h`, and read the
+  [Aurora Client Commands](/documentation/latest//) document.



Mime
View raw message