aurora-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dles...@apache.org
Subject svn commit: r1623604 [2/2] - in /incubator/aurora/site: ./ publish/documentation/latest/client-commands/ publish/documentation/latest/clientv2/ publish/documentation/latest/committers/ publish/documentation/latest/configuration-reference/ publish/docum...
Date Tue, 09 Sep 2014 00:47:20 GMT
Modified: incubator/aurora/site/publish/documentation/latest/vagrant/index.html
URL: http://svn.apache.org/viewvc/incubator/aurora/site/publish/documentation/latest/vagrant/index.html?rev=1623604&r1=1623603&r2=1623604&view=diff
==============================================================================
--- incubator/aurora/site/publish/documentation/latest/vagrant/index.html (original)
+++ incubator/aurora/site/publish/documentation/latest/vagrant/index.html Tue Sep  9 00:47:19 2014
@@ -65,22 +65,55 @@
 <!-- /breadcrumb -->
 	
       <div class="container">
-        <p>Aurora includes a <code>Vagrantfile</code> that defines a full Mesos cluster running Aurora. You can use it to
-explore Aurora&rsquo;s various components. To get started, install
-<a href="https://www.virtualbox.org/">VirtualBox</a> and <a href="http://www.vagrantup.com/">Vagrant</a>,
-then run <code>vagrant up</code> somewhere in the repository source tree to create a team of VMs.  This may take some time initially as it builds all
-the components involved in running an aurora cluster.</p>
-
-<p>The scheduler is listening on <a href="http://192.168.33.7:8081/scheduler">http://192.168.33.7:8081/scheduler</a>
-The observer is listening on <a href="http://192.168.33.7:1338">http://192.168.33.7:1338</a>
-The master is listening on <a href="http://192.168.33.7:5050">http://192.168.33.7:5050</a></p>
+        <h1 id="getting-started">Getting Started</h1>
 
-<p>Once everything is up, you can <code>vagrant ssh devcluster</code> and execute aurora client commands using the <code>aurora</code> client.</p>
+<p>To replicate a real cluster environment as closely as possible, we use
+<a href="http://www.vagrantup.com/">Vagrant</a> to launch a complete Aurora cluster in a virtual machine.</p>
 
-<h2 id="troubleshooting">Troubleshooting</h2>
+<h2 id="prerequisites">Prerequisites</h2>
+
+<ul>
+<li><a href="https://www.virtualbox.org/">VirtualBox</a></li>
+<li><a href="http://www.vagrantup.com/">Vagrant</a></li>
+<li>A clone of the Aurora repository, or source distribution.</li>
+</ul>
+
+<p>You can start a local cluster by running:</p>
+<pre class="highlight text">vagrant up
+</pre>
+<p>Once started, several services should be running:</p>
+
+<ul>
+<li>scheduler is listening on <a href="http://192.168.33.7:8081">http://192.168.33.7:8081</a></li>
+<li>observer is listening on <a href="http://192.168.33.7:1338">http://192.168.33.7:1338</a></li>
+<li>master is listening on <a href="http://192.168.33.7:5050">http://192.168.33.7:5050</a></li>
+<li>slave is listening on <a href="http://192.168.33.7:5051">http://192.168.33.7:5051</a></li>
+</ul>
+
+<p>You can SSH into the machine with <code>vagrant ssh</code> and execute aurora client commands using the
+<code>aurora</code> command.  A pre-installed <code>clusters.json</code> file refers to your local cluster as
+<code>devcluster</code>, which you will use in client commands.</p>
+
+<h1 id="deleting-your-local-cluster">Deleting your local cluster</h1>
+
+<p>Once you are finished with your local cluster, or if you would otherwise like to start from scratch,
+you can use the command <code>vagrant destroy</code> to turn off and delete the virtual file system.</p>
+
+<h1 id="rebuilding-components">Rebuilding components</h1>
+
+<p>If you are changing Aurora code and would like to rebuild a component, you can use the <code>aurorabuild</code>
+command on your vagrant machine to build and restart a component.  This is considerably faster than
+destroying and rebuilding your VM.</p>
+
+<p><code>aurorabuild</code> accepts a list of components to build and update.  You may invoke the command with
+no arguments to get a list of supported components.</p>
+<pre class="highlight text"> vagrant ssh -c &#39;aurorabuild client&#39;
+</pre>
+<h1 id="troubleshooting">Troubleshooting</h1>
 
 <p>Most of the vagrant related problems can be fixed by the following steps:
 * Destroying the vagrant environment with <code>vagrant destroy</code>
+* Killing any orphaned VMs (see AURORA-499) with <code>virtualbox</code> UI or <code>VBoxManage</code> command line tool
 * Cleaning the repository of build artifacts and other intermediate output with <code>git clean -fdx</code>
 * Bringing up the vagrant environment with <code>vagrant up</code></p>
 

Modified: incubator/aurora/site/source/documentation/latest/client-commands.md
URL: http://svn.apache.org/viewvc/incubator/aurora/site/source/documentation/latest/client-commands.md?rev=1623604&r1=1623603&r2=1623604&view=diff
==============================================================================
--- incubator/aurora/site/source/documentation/latest/client-commands.md (original)
+++ incubator/aurora/site/source/documentation/latest/client-commands.md Tue Sep  9 00:47:19 2014
@@ -319,7 +319,7 @@ In addition to the required job key argu
 - `--restart_threshold`: Defaults to `60`, the maximum number of
   seconds before a shard must move into the `RUNNING` state before
   it's considered a failure.
-- `--watch_secs`: Defaults to `30`, the minimum number of seconds a
+- `--watch_secs`: Defaults to `45`, the minimum number of seconds a
   shard must remain in `RUNNING` state before considered a success.
 
 Cron Jobs

Added: incubator/aurora/site/source/documentation/latest/clientv2.md
URL: http://svn.apache.org/viewvc/incubator/aurora/site/source/documentation/latest/clientv2.md?rev=1623604&view=auto
==============================================================================
--- incubator/aurora/site/source/documentation/latest/clientv2.md (added)
+++ incubator/aurora/site/source/documentation/latest/clientv2.md Tue Sep  9 00:47:19 2014
@@ -0,0 +1,405 @@
+Aurora Client v2
+=================
+
+Overview
+-----------
+
+Our goal is to replace the current Aurora command-line client. The
+current client suffers from an early monolithic structure, and a long
+development history of rapid unplanned evolution.
+
+In addition to its internal problems, the current Aurora client is
+confusing for users. There are several different kinds of objects
+manipulated by the Aurora command line, and the difference between
+them is often not clear. (What's the difference between a job and a
+configuration?) For each type of object, there are different commands,
+and it's hard to remember which command should be used for which kind
+of object.
+
+Instead of continuing to let the Aurora client develop and evolve
+randomly, it's time to take a principled look at the Aurora command
+line, and figure out how to make our command line processing make
+sense. At the same time, the code needs to be cleaned up, and divided
+into small comprehensible units based on a plugin architecture.
+
+Doing this now will give us a more intuitive, consistent, and easy to
+use client, as well as a sound platform for future development.
+
+Goals
+-------
+
+* A command line tool for interacting with Aurora that is easy for
+  users to understand.
+* A noun/verb command model.
+* A modular source-code architecture.
+* Non-disruptive transition for users.
+
+Non-Goals
+----------
+
+* The most important non-goal is that we're not trying to redesign the
+  Aurora scheduler, the Aurora executor, or any of the peripheral tools
+  that the Aurora command line interacts with; we only want to create a
+  better command line client.
+* We do not want to change thermos, mesos, hadoop, etc.
+* We do not want to create new objects that users will work with to
+  interact with Mesos or Aurora.
+* We do not want to change Aurora job configuration files or file formats.
+* We do not want to change the Aurora API.
+* We don't want to boil the ocean: there are many things that we could
+  include in the scope of this project, but we don't want to be
+  distracted by re-implementing all of twitter.commons in order to
+  create a perfect Aurora client.
+
+
+Background
+-----------
+
+Aurora is a system that's used to run and manage services and
+service-like jobs running in a datacenter. Aurora takes care of
+allocating resources in order to schedule and run jobs without
+requiring teams to manage dedicated hardware. The heart of Aurora is
+called the scheduler, and is responsible for finding and assigning
+resources to tasks.
+
+The Aurora scheduler provides a thrift API. The scheduler API is
+low-level and difficult to interact with. Users do not interact
+directly with the Aurora API; instead, they use a command-line tool,
+which provides a collection of easy-to-use commands. This command-line
+tool, in turn, talks to the scheduler API to launch and manage jobs in
+datacenter clusters. The command-line tool is called the Aurora
+client.
+
+The current implementation of the Aurora client is haphazard,
+and really needs to be cleaned up:
+
+- The code is monolithic and hard to maintain. It's implemented using
+  `twitter.common.app`, which assumes that all of the command code lives
+  in a single source file. To work around this, and allow some
+  subdivision, it uses a hack of `twitter.common.app` to force
+  registration of commands from multiple modules. It's hard to
+  understand, and hard to modify.
+- The current code is very difficult to test. Because of the way it's
+  built, there is no consistent way of passing key application data
+  around. As a result, each unit test of client operations needs a
+  difficult-to-assemble custom setup of mock objects.
+- The current code handles errors poorly, and it is difficult to
+  fix. Many common errors produce unacceptable results. For example,
+  issuing an unknown command generates an error message "main takes 0
+  parameters but received 1"; passing an invalid parameter to other
+  commands frequently produces a stack trace.
+- The current command line is confusing for users. There are several
+  different kinds of objects manipulated by the Aurora command line,
+  and the difference between them is often not entirely clear. (What's
+  the difference between a job and a configuration?)
+  For each type of object, there are different
+  commands, and it's frequently not clear just which command should be
+  used for which object.
+
+
+Instead of continuing to let it develop and evolve randomly, it's time
+to take a principled look at the Aurora command line, and figure out
+how to make command line processing make sense. At the same time, the
+code needs to be cleaned up, and divided into small comprehensible
+units based on a plugin architecture.
+
+Requirements
+-------------
+
+Aurora is aimed at engineers who run jobs and services in a
+datacenter. As a result, the requirements for the aurora client are
+all engineering focused:
+
+* __Consistency__: commands should follow a consistent structure, so that
+  users can apply knowledge and intuition gained from working with
+  some aurora commands to new commands. This means that when commands
+  can re-use the same options, they should; that objects should be
+  referred to by consistent syntax throughout the tool.
+* __Helpfulness__: commands should be structured so that the system can
+  generate helpful error messages. If a user just runs "aurora", they
+  should get a basic usage message. If they try to run an invalid
+  command, they should get a message that the command is invalid, not
+  a stack dump or "command main() takes 0 parameters but received
+  2". Commands should not generate extraneous output that obscures the
+  key facts that the user needs to know, and the default behavior of
+  commands should not generate outputs that will be routinely ignored
+  by users.
+* __Extensibility__: it should be easy to plug in new commands,
+  including custom commands, to adapt the Aurora client to new
+  environments.
+* __Script-friendly command output__: every command should at least include
+  an option that generates output that's script-friendly. Scripts should be
+  able to work with command-output without needing to do screen scraping.
+* __Scalability__: the tools should be usable for any foreseeable size
+  of Aurora datacenters and machine clusters.
+
+Design Overview
+-----------------
+
+The Aurora client will be reimplemented using a noun-verb model,
+similar to the cmdlet model used by Monad/Windows Powershell. Users
+will work by providing a noun for the type of object being operated
+on, and a verb for the specific operation being performed on the
+object, followed by parameters. For example, to create a job, the user
+would execute: "`aurora job create smfd/mchucarroll/devel/jobname
+job.aurora`". The noun is `job` and the verb is `create`.
+
+The client will be implemented following that noun-verb
+convention. Each noun will be a separate component, which can be
+registered into the command-line framework. Each verb will be
+implemented by a class that registers with the appropriate noun. Nouns
+and verbs will each provide methods that add their command line
+options and parameters to the options parser, using the Python
+argparse library.
+
+Detailed Design
+-----------------
+
+### Interface
+
+In this section, we'll walk through the types of objects that the
+client can manipulate, and the operations that need to be provided for
+each object. These form the primary interface that engineers will use
+to interact with Aurora.
+
+In the command-line, each of the object types will have an Aurora
+subcommand. The commands to manipulate the object type will follow the
+type. For example, here are several commands in the old syntax
+contrasted against the new noun/verb syntax.
+
+* Get quota for a role:
+   * Noun/Verb syntax:  `aurora quota get west/www-data`
+   * Old syntax: `aurora get_quota --cluster=smf1 www-data`
+* Create job:
+   * Noun/Verb syntax: `aurora job create west/www-data/test/job job.aurora`
+   * Old syntax: `aurora create west/www-data/test/job job.aurora`
+* Schedule a job to run at a specific interval:
+   * Noun/verb: `aurora cron schedule east/www-data/test/job job.aurora`
+   * Old: `aurora create east/www-data/test/job job.aurora`
+
+As you can see in these examples, the new syntax is more consistent:
+you always specify the cluster where a command executes as part of an
+identifier, where in the old syntax, it was sometimes part of the
+jobkey and sometimes specified with a "--cluster" option.
+
+The new syntax is also more clear and explicit: even without knowing
+much about Aurora, it's clear what objects each command is acting on,
+where in the old syntax, commands like "create" are unclear.
+
+### The Job Noun
+
+A job is a configured program ready to run in Aurora. A job is,
+conceptually, a task factory: when a job is submitted to the Aurora
+scheduler, it creates a collection of tasks. The job contains a
+complete description of everything it needs to create a collection of
+tasks. (Note that this subsumes "service" commands. A service is just
+a task whose configuration sets the is_service flag, so we don't have
+separate commands for working with services.) Jobs are specified using
+`cluster/role/env/name` jobkey syntax.
+
+* `aurora job create *jobkey* *config*`:  submits a job to a cluster, launching the task(s) specified by the job config.
+* `aurora job status *jobkey*`: query job status. Prints information about the job,
+  whether it's running, etc., to standard out. If jobkey includes
+  globs, it should list all jobs that match the glob
+* `aurora job kill *jobkey*/*instanceids*`: kill/stop some of a jobs instances. This stops a job' tasks; if the job
+  has service tasks, they'll be  disabled, so that they won't restart.
+* `aurora job killall *jobkey*`: kill all of the instances of a job. This
+  is distinct from the *kill* command as a safety measure: omitting the
+  instances from a kill command shouldn't result in destroying the entire job.
+* `aurora job restart *jobkey*`: conceptually, this will kill a job, and then
+  launch it again. If the job does not exist, then fail with an error
+  message.  In fact, the underlying implementation does the
+  kill/relaunch on a rolling basis - so it's not an immediate kill of
+  all shards/instances, followed by a delay as all instances relaunch,
+  but rather a controlled gradual process.
+* `aurora job list *jobkey*`: list all jobs that match the jobkey spec that are
+  registered with the scheduler. This will include both jobs that are
+  currently running, and jobs that are scheduled to run at a later
+  time. The job key can be partial: if it specifies cluster, all jobs
+  on the cluster will be listed; cluster/role, all jobs running on the cluster under the role will be listed, etc.
+
+The Schedule Noun (Cron)
+--------------------------
+
+Note (3/21/2014): The "cron" noun is _not_ implemented yet.
+
+Cron is a scheduler adjunct that periodically runs a job on a
+schedule. The cron commands all manipulate cron schedule entries. The
+schedules are specified as a part of the job configuration.
+
+* `aurora cron schedule jobkey config`: schedule a job to run by cron.
+* `aurora cron deschedule jobkey`: removes a jobs entry from the cron schedule.
+* `aurora cron status jobkey`: query for a scheduled job's status.
+
+The Quota Noun
+---------------
+
+A quota is a data object maintained by the scheduler that specifies the maximum
+resources that may be consumed by jobs owned by a particular role. In the future,
+we may add new quota types. At some point, we'll also probably add an administrators
+command to set quotas.
+
+* `aurora quota get *cluster/role*`
+
+
+Implementation
+---------------
+
+The current command line is monolithic. Every command on an Aurora
+object is a top-level command in the Aurora client. In the
+restructured command line, each of the primary object types
+manipulated by Aurora should have its own sub-command.
+
+* Advantages of this approach:
+   * Easier to detangle the command-line processing. The top-level
+     command-processing will be a small set of subcommand
+     processors. Option processing for each subcommand can be offloaded
+     to a separate module.
+   * The aurora top-level help command will be much more
+     comprehensible. Instead of giving a huge list of every possible
+     command, it will present the list of top-level object types, and
+     then users can request help on the commands for a specific type
+     of object.
+   * The sub-commands can be separated into distinct command-line
+     tools when appropriate.
+
+### Command Structure and Options Processing
+
+The implementation will follow closely on Pants goals. Pants goals use
+a static registration system to add new subcommands. In pants, each
+goal command is an implementation of a command interface, and provides
+implementations of methods to register options and parameters, and to
+actually execute the command. In this design, commands are modular and
+easy to implement, debug, and combine in different ways.
+
+For the Aurora client, we plan to use a two-level variation of the
+basic concept from pants. At the top-level we will have nouns. A noun
+will define some common command-line parameters required by all of its
+verbs, and will provide a registration hook for attaching verbs. Nouns
+will be implemented as a subclass of a basic Noun type.
+
+Each verb will, similarly, be implemented as a subclass of Verb. Verbs
+will be able to specify command-line options and parameters.
+
+Both `Noun` and `Verb` will be subclasses of a common base-class `AuroraCommand`:
+
+    class AuroraCommand(object):
+      def get_options(self):
+      """Gets the set of command-line options objects for this command.
+      The result is a list of CommandOption objects.
+       """
+        pass
+
+      @property
+      def help(self):
+        """Returns the help message for this command"""
+
+      @property
+      def usage(self):
+        """Returns a short usage description of the command"""
+
+      @property
+      def name(self):
+        """Returns the command name"""
+
+
+A command-line tool will be implemented as an instance of a `CommandLine`:
+
+    class CommandLine(object):
+      """The top-level object implementing a command-line application."""
+
+      @property
+      def name(self):
+        """Returns the name of this command-line tool"""
+
+      def print_out(self, str):
+        print(str)
+
+      def print_err(self, str):
+        print(str, file=sys.stderr)
+
+      def register_noun(self, noun):
+        """Adds a noun to the application"""
+
+      def register_plugin(self, plugin):
+	     """Adds a configuration plugin to the system"""
+
+
+Nouns are registered into a command-line using the `register_noun`
+method. They are weakly coupled to the application, making it easy to
+use a single noun in several different command-line tools. Nouns allow
+the registration of verbs using the `register_verb` method.
+
+When commands execute, they're given an instance of a *context object*.
+The context object must be an instance of a subclass of `AuroraCommandContext`.
+Options, parameters, and IO are all accessed using the context object. The context
+is created dynamically by the noun object owning the verb being executed. Developers
+are strongly encouraged to implement custom contexts for their nouns, and move functionality
+shared by the noun's verbs into the context object. The context interface is:
+
+    class Context(object):
+      class Error(Exception): pass
+
+      class ArgumentException(Error): pass
+
+      class CommandError(Error):
+
+      @classmethod
+      def exit(cls, code, msg):
+	    """Exit the application with an error message"""
+        raise cls.CommandError(code, msg)
+
+     def print_out(self, msg, indent=0):
+       """Prints a message to standard out, with an indent"""
+
+     def print_err(self, msg, indent=0):
+       """Prints a message to standard err, with an indent"""
+
+
+In addition to nouns and verbs, there's one more kind of registerable
+component, called a *configuration plugin*. These objects add a set of
+command-line options that can be passed to *all* of the commands
+implemented in the tool. Before the command is executed, the
+configuration plugin will be invoked, and will process its
+command-line arguments. This is useful for general configuration
+changes, like establish a secure tunnel to talk to machines in a
+datacenter. (A useful way to think of a plugin is as something like an
+aspect that can be woven in to aurora to provide environment-specific
+configuration.) A configuration plugin is implemented as an instance
+of class `ConfigurationPlugin`, and registered with the
+`register_plugin` method of the `CommandLine` object. The interface of
+a plugin is:
+
+    class ConfigurationPlugin(object):
+      """A component that can be plugged in to a command-line."""
+
+      @abstractmethod
+      def get_options(self):
+        """Return the set of options processed by this plugin"""
+
+      @abstractmethod
+      def execute(self, context):
+        """Run the context/command line initialization code for this plugin."""
+
+
+### Command Execution
+
+The options process and command execution is built as a facade over Python's
+standard argparse. All of the actual argument processing is done by the
+argparse library.
+
+Once the options are processed, the framework will start to execute the command. Command execution consists of:
+
+# Create a context object. The framework will use the argparse options to identify
+  which noun is being invoked, and will call that noun's `create_context` method.
+  The argparse options object will be stored in the context.
+# Execute any configuration plugins. Before any command is invoked, the framework
+  will first iterate over all of the registered configuration plugins. For each
+  plugin, it will invoke the `execute` method.
+# The noun will use the context to find out what verb is being invoked, and it will
+  then call that verb's `execute` method.
+# The command will exit. Its return code will be whatever was returned by the verb's
+  `execute` method.
+
+Commands are expected to return a code from a list of standard exit codes,
+which can be found in `src/main/python/apache/aurora/client/cli/__init__.py`.

Added: incubator/aurora/site/source/documentation/latest/committers.md
URL: http://svn.apache.org/viewvc/incubator/aurora/site/source/documentation/latest/committers.md?rev=1623604&view=auto
==============================================================================
--- incubator/aurora/site/source/documentation/latest/committers.md (added)
+++ incubator/aurora/site/source/documentation/latest/committers.md Tue Sep  9 00:47:19 2014
@@ -0,0 +1,49 @@
+Setting up your email account
+-----------------------------
+Once your Apache ID has been set up you can configure your account and add ssh keys and setup an
+email forwarding address at
+
+  http://id.apache.org
+
+Additional instructions for setting up your new committer email can be found at
+
+  http://www.apache.org/dev/user-email.html
+
+The recommended setup is to configure all services (mailing lists, JIRA, ReviewBoard) to send
+emails to your @apache.org email address.
+
+
+Creating a release
+------------------
+The following will guide you through the steps to create a release candidate, vote, and finally an
+official Apache Aurora release. Before starting your gpg key should be in the KEYS file and you
+must have access to commit to the dist.a.o repositories.
+
+1. Ensure that all issues resolved for this release candidate are tagged with the correct Fix
+Version in Jira, the changelog script will use this to generate the CHANGELOG in step #2.
+
+2. Create a release candidate. This will automatically update the CHANGELOG and commit it, create a
+branch and update the current version within the trunk. To create a minor version update and publish
+it run
+
+               ./build-support/release/release-candidate -l m -p
+
+3. Update, if necessary, the draft email created from the `release-candidate` script in step #2 and
+send the [VOTE] email to the dev@ and private@ mailing lists. You can verify the release signature
+and checksums by running
+
+				./build-support/release/verify-release-candidate
+
+4. Wait for the vote to complete. If the vote fails address any issues and go back to step #1 and
+run again, this time you will use the -r flag to increment the release candidate version. This will
+automatically clean up the release candidate rc0 branch and source distribution.
+
+               ./build-support/release/release-candidate -l m -r 1 -p
+
+5. Once the vote has successfully passed create the release
+
+               ./build-support/release/release
+
+6. Update the draft email created fom the `release` script in step #5 to include the Apache ID's for
+all binding votes and send the [RESULT][VOTE] email to the dev@ and private@ mailing lists.
+

Modified: incubator/aurora/site/source/documentation/latest/configuration-reference.md
URL: http://svn.apache.org/viewvc/incubator/aurora/site/source/documentation/latest/configuration-reference.md?rev=1623604&r1=1623603&r2=1623604&view=diff
==============================================================================
--- incubator/aurora/site/source/documentation/latest/configuration-reference.md (original)
+++ incubator/aurora/site/source/documentation/latest/configuration-reference.md Tue Sep  9 00:47:19 2014
@@ -28,6 +28,7 @@ Aurora + Thermos Configuration Reference
     - [Services](#services)
     - [UpdateConfig Objects](#updateconfig-objects)
     - [HealthCheckConfig Objects](#healthcheckconfig-objects)
+    - [Announcer Objects](#announcer-objects)
 - [Specifying Scheduling Constraints](#specifying-scheduling-constraints)
 - [Template Namespaces](#template-namespaces)
     - [mesos Namespace](#mesos-namespace)
@@ -341,7 +342,7 @@ Parameters for controlling the rate and 
 | ---------------------------- | :------: | ------------
 | ```batch_size```             | Integer  | Maximum number of shards to be updated in one iteration (Default: 1)
 | ```restart_threshold```      | Integer  | Maximum number of seconds before a shard must move into the ```RUNNING``` state before considered a failure (Default: 60)
-| ```watch_secs```             | Integer  | Minimum number of seconds a shard must remain in ```RUNNING``` state before considered a success (Default: 30)
+| ```watch_secs```             | Integer  | Minimum number of seconds a shard must remain in ```RUNNING``` state before considered a success (Default: 45)
 | ```max_per_shard_failures``` | Integer  | Maximum number of restarts per shard during update. Increments total failure count when this limit is exceeded. (Default: 0)
 | ```max_total_failures```     | Integer  | Maximum number of shard failures to be tolerated in total during an update. Cannot be greater than or equal to the total number of tasks in a job. (Default: 0)
 
@@ -351,11 +352,47 @@ Parameters for controlling a task's heal
 
 | object                         | type      | description
 | -------                        | :-------: | --------
-| ```initial_interval_secs```    | Integer   | Initial delay for performing an HTTP health check. (Default: 60)
-| ```interval_secs```            | Integer   | Interval on which to check the task's health via HTTP. (Default: 30)
+| ```initial_interval_secs```    | Integer   | Initial delay for performing an HTTP health check. (Default: 15)
+| ```interval_secs```            | Integer   | Interval on which to check the task's health via HTTP. (Default: 10)
 | ```timeout_secs```             | Integer   | HTTP request timeout. (Default: 1)
 | ```max_consecutive_failures``` | Integer   | Maximum number of consecutive failures that tolerated before considering a task unhealthy (Default: 0)
 
+### Announcer Objects
+
+If the `announce` field in the Job configuration is set, each task will be
+registered in the ServerSet `/aurora/role/environment/jobname` in the
+zookeeper ensemble configured by the executor.  If no Announcer object is specified,
+no announcement will take place.  For more information about ServerSets, see the [User Guide](/documentation/latest/user-guide/).
+
+| object                         | type      | description
+| -------                        | :-------: | --------
+| ```primary_port```             | String    | Which named port to register as the primary endpoint in the ServerSet (Default: `http`)
+| ```portmap```                  | dict      | A mapping of additional endpoints to announced in the ServerSet (Default: `{ 'aurora': '{{primary_port}}' }`)
+
+### Port aliasing with the Announcer `portmap`
+
+The primary endpoint registered in the ServerSet is the one allocated to the port
+specified by the `primary_port` in the `Announcer` object, by default
+the `http` port.  This port can be referenced from anywhere within a configuration
+as `{{thermos.ports[http]}}`.
+
+Without the port map, each named port would be allocated a unique port number.
+The `portmap` allows two different named ports to be aliased together.  The default
+`portmap` aliases the `aurora` port (i.e. `{{thermos.ports[aurora]}}`) to
+the `http` port.  Even though the two ports can be referenced independently,
+only one port is allocated by Mesos.  Any port referenced in a `Process` object
+but which is not in the portmap will be allocated dynamically by Mesos and announced as well.
+
+It is possible to use the portmap to alias names to static port numbers, e.g.
+`{'http': 80, 'https': 443, 'aurora': 'http'}`.  In this case, referencing
+`{{thermos.ports[aurora]}}` would look up `{{thermos.ports[http]}}` then
+find a static port 80.  No port would be requested of or allocated by Mesos.
+
+Static ports should be used cautiously as Aurora does nothing to prevent two
+tasks with the same static port allocations from being co-scheduled.
+External constraints such as slave attributes should be used to enforce such
+guarantees should they be needed.
+
 Specifying Scheduling Constraints
 =================================
 

Modified: incubator/aurora/site/source/documentation/latest/configuration-tutorial.md
URL: http://svn.apache.org/viewvc/incubator/aurora/site/source/documentation/latest/configuration-tutorial.md?rev=1623604&r1=1623603&r2=1623604&view=diff
==============================================================================
--- incubator/aurora/site/source/documentation/latest/configuration-tutorial.md (original)
+++ incubator/aurora/site/source/documentation/latest/configuration-tutorial.md Tue Sep  9 00:47:19 2014
@@ -589,7 +589,7 @@ The final three Job attributes each take
     -   `restart_threshold`: An integer, defaulting to `60`, specifying
         the maximum number of seconds before a shard must move into the
         `RUNNING` state before considered a failure.
-    -   `watch_secs`: An integer, defaulting to `30`, specifying the
+    -   `watch_secs`: An integer, defaulting to `45`, specifying the
         minimum number of seconds a shard must remain in the `RUNNING`
         state before considered a success.
     -   `max_per_shard_failures`: An integer, defaulting to `0`,
@@ -604,9 +604,9 @@ The final three Job attributes each take
     parameters for controlling a Task's health checks via HTTP. Only
     used if a health port was assigned with a command line wildcard. The
     `HealthCheckConfig` parameters are:
-    -   `initial_interval_secs`: An integer, defaulting to `60`,
+    -   `initial_interval_secs`: An integer, defaulting to `15`,
         specifying the initial delay for doing an HTTP health check.
-    -   `interval_secs`: An integer, defaulting to `30`, specifying the
+    -   `interval_secs`: An integer, defaulting to `10`, specifying the
         number of seconds in the interval between checking the Task's
         health.
     -   `timeout_secs`: An integer, defaulting to `1`, specifying the

Modified: incubator/aurora/site/source/documentation/latest/contributing.md
URL: http://svn.apache.org/viewvc/incubator/aurora/site/source/documentation/latest/contributing.md?rev=1623604&r1=1623603&r2=1623604&view=diff
==============================================================================
--- incubator/aurora/site/source/documentation/latest/contributing.md (original)
+++ incubator/aurora/site/source/documentation/latest/contributing.md Tue Sep  9 00:47:19 2014
@@ -1,21 +1,14 @@
+Find Something to Do
+--------------------
+There are issues in [Jira](https://issues.apache.org/jira/browse/AURORA) with the
+["newbie" tag](https://issues.apache.org/jira/browse/AURORA-189?jql=project%20%3D%20AURORA%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20newbie%20ORDER%20BY%20priority%20DESC)
+that are good starting places for new Aurora contributors; pick one of these and dive in! Once
+you've got a patch, the next step is to post a review.
+
 Getting your ReviewBoard Account
 --------------------------------
 Go to https://reviews.apache.org and create an account.
 
-Setting up your email account (committers)
-------------------------------------------
-Once your Apache ID has been set up you can configure your account and add ssh keys and
-setup an email forwarding address at
-
-  http://id.apache.org
-
-Additional instructions for setting up your new committer email can be found at
-
-  http://www.apache.org/dev/user-email.html
-
-The recommended setup is to configure all services (mailing lists, JIRA, ReviewBoard) to
-send emails to your @apache.org email address.
-
 Setting up your ReviewBoard Environment
 ---------------------------------------
 Run `./rbt status`. The first time this runs it will bootstrap and you will be asked to login.
@@ -25,7 +18,9 @@ Submitting a Patch for Review
 -----------------------------
 Post a review with `rbt`, fill out the fields in your browser and hit Publish.
 
-    ./rbt post -o -g
+    ./rbt post -o
+
+Once you've done this, you probably want to mark the associated Jira issue as Reviewable.
 
 Updating an Existing Review
 ---------------------------
@@ -59,3 +54,9 @@ Sometimes you'll need to merge someone e
     ./rbt patch -c <RB_ID>
     git show master  # Verify everything looks sane, author is correct
     git push origin master
+
+Cleaning Up
+-----------
+Your patch has landed, congratulations! The last thing you'll want to do before moving on to your
+next fix is to clean up your Jira and Reviewboard. The former of which should be marked as
+"Resolved" while the latter should be marked as "Submitted".

Modified: incubator/aurora/site/source/documentation/latest/deploying-aurora-scheduler.md
URL: http://svn.apache.org/viewvc/incubator/aurora/site/source/documentation/latest/deploying-aurora-scheduler.md?rev=1623604&r1=1623603&r2=1623604&view=diff
==============================================================================
--- incubator/aurora/site/source/documentation/latest/deploying-aurora-scheduler.md (original)
+++ incubator/aurora/site/source/documentation/latest/deploying-aurora-scheduler.md Tue Sep  9 00:47:19 2014
@@ -5,7 +5,7 @@ Installing Aurora
 =================
 Aurora is a standalone Java server. As part of the build process it creates a bundle of all its
 dependencies, with the notable exceptions of the JVM and libmesos. Each target server should have
-a JVM (Java 7 or higher) and libmesos (0.17.0) installed.
+a JVM (Java 7 or higher) and libmesos (0.18.0) installed.
 
 Creating the Distribution .zip File (Optional)
 ----------------------------------------------
@@ -47,7 +47,6 @@ Like Mesos, Aurora uses command-line fla
     # Flags controlling the scheduler.
     AURORA_FLAGS=(
       -http_port=8081
-      -thrift_port=8082
       # Log configuration, etc.
     )
 
@@ -92,8 +91,8 @@ should be set to `2`, and in a cluster o
 
 Network considerations
 ----------------------
-The Aurora scheduler listens on 3 ports - a Thrift port for client RPCs, an admin web UI, and a
-libprocess (HTTP+Protobuf) port used to communicate with the Mesos master and for the log
+The Aurora scheduler listens on 2 ports - an HTTP port used for client RPCs and a web UI,
+and a libprocess (HTTP+Protobuf) port used to communicate with the Mesos master and for the log
 replication protocol. These can be left unconfigured (the scheduler publishes all selected ports
 to ZooKeeper) or explicitly set in the startup script as follows:
 
@@ -101,7 +100,6 @@ to ZooKeeper) or explicitly set in the s
     AURORA_FLAGS=(
       # ...
       -http_port=8081
-      -thrift_port=8082
       # ...
     )
     # ...

Added: incubator/aurora/site/source/documentation/latest/developing-aurora-client.md
URL: http://svn.apache.org/viewvc/incubator/aurora/site/source/documentation/latest/developing-aurora-client.md?rev=1623604&view=auto
==============================================================================
--- incubator/aurora/site/source/documentation/latest/developing-aurora-client.md (added)
+++ incubator/aurora/site/source/documentation/latest/developing-aurora-client.md Tue Sep  9 00:47:19 2014
@@ -0,0 +1,114 @@
+
+Getting Started
+=================
+
+Aurora consists of four main pieces: the scheduler (which finds resources in the cluster that can be used to run a job), the executor (which uses the resources assigned by the scheduler to run a job), the command-line client, and the web-ui. For information about working on the scheduler or the webUI, see the file "developing-aurora-scheduler.md" in this directory.
+
+If you want to work on the command-line client, this is the place for you!
+
+The client is written in Python, and unlike the server side of things, we build the client using the Pants build tool, instead of Gradle. Pants is a tool that was built by twitter for handling builds of large collaborative systems. You can see a detailed explanation of
+pants [here](http://pantsbuild.github.io/python-readme.html).
+
+To build the client executable, run the following in a command-shell:
+
+    $ ./pants src/main/python/apache/aurora/client/cli:aurora2
+
+This will produce a python executable _pex_ file in `dist/aurora2.pex`. Pex files
+are fully self-contained executables: just copy the pex file into your path, and you'll be able to run it. For example, for a typical installation:
+
+    $ cp dist/aurora2.pex /usr/local/bin/aurora
+
+To run all of the client tests:
+
+    $ ./pasts src/test/python/apache/aurora/client/:all
+
+
+Client Versions
+==================
+
+There are currently two versions of the aurora client, imaginatively known as v1 and v2. All new development is done entirely in v2, but we continue to support and fix bugs in v1, until we get to the point where v2 is feature-complete and tested, and aurora users have had some time at adapt and switch their processes to use v2.
+
+Both versions are built on the same underlying API code.
+
+Client v1 was implemented using twitter.common.app. The command-line processing code for v1 can be found in `src/main/python/apache/aurora/client/commands` and
+`src/main/python/apache/aurora/client/bin`.
+
+Client v2 was implemented using its own noun/verb framework. The client v2 code can be found in `src/main/python/apache/aurora/client/cli`, and the noun/verb framework can be
+found in the `__init__.py` file in that directory.
+
+
+Building and Testing the Client
+=================================
+
+Building and testing the client code are both done using Pants. The relevant targets to know about are:
+
+   * Build a client v2 executable: `./pants src/main/python/apache/aurora/client/cli:aurora2`
+   * Test client v2 code: `./pants ./pants src/test/python/apache/aurora/client/cli:all`
+   * Build a client v1 executable: `./pants src/main/python/apache/aurora/client/bin:aurora_client`
+   * Test client v1 code: `./pants src/main/python/apache/aurora/client/commands:all`
+   * Test all client code: `./pants src/main/python/apache/aurora/client:all`
+
+
+Overview of the Client Architecture
+=====================================
+
+The client is built on a stacked architecture:
+
+   1. At the lowest level, we have a thrift RPC API interface
+    to the aurora scheduler. The interface is declared in thrift, in the file
+    `src/main/thrift/org/apache/aurora/gen/api.thrift`.
+
+  2. On top of the primitive API, we have a client API. The client API
+    takes the primitive operations provided by the scheduler, and uses them
+    to implement client-side behaviors. For example, when you update a job,
+    on the scheduler, that's done by a sequence of operations.  The sequence is implemented
+    by the client API `update` method, which does the following using the thrift API:
+     * fetching the state of task instances in the mesos cluster, and figuring out which need
+       to be updated;
+     * For each task to be updated:
+         - killing the old version;
+         - starting the new version;
+         - monitoring the new version to ensure that the update succeeded.
+  3. On top of the API, we have the command-line client itself. The core client, at this level,
+    consists of the interface to the command-line which the user will use to interact with aurora.
+    The client v2 code is found in `src/python/apache/aurora/client/cli`. In the `cli` directory,
+    the rough structure is as follows:
+       * `__init__.py` contains the noun/verb command-line processing framework used by client v2.
+       * `jobs.py` contains the implementation of the core `job` noun, and all of its operations.
+       * `bridge.py` contains the implementation of a component that allows us to ship a
+         combined client that runs both v1 and v2 client commands during the transition period.
+       * `client.py` contains the code that binds the client v2 nouns and verbs into an executable.
+
+Running/Debugging the Client
+=============================
+
+For manually testing client changes against a cluster, we use vagrant. To start a virtual cluster,
+you need to install a working vagrant environment, and then run "vagrant up" for the root of
+the aurora workspace. This will create a vagrant host named "devcluster", with a mesos master,
+a set of mesos slaves, and an aurora scheduler.
+
+To use the devcluster, you need to bring it up by running `vagrant up`, and then connect to the vagrant host using `vagrant ssh`. This will open a bash session on the virtual machine hosting the devcluster. In the home directory, there are two key paths to know about:
+
+   * `~/aurora`: this is a copy of the git workspace in which you launched the vagrant cluster.
+     To test client changes, you'll use this copy.
+   * `/vagrant`: this is a mounted filesystem that's a direct image of your git workspace.
+     This isn't a copy - it is your git workspace. Editing files on your host machine will
+     be immediately visible here, because they are the same files.
+
+Whenever the scheduler is modified, to update your vagrant environment to use the new scheduler,
+you'll need to re-initialize your vagrant images. To do this, you need to run two commands:
+
+   * `vagrant destroy`: this will delete the old devcluster image.
+   * `vagrant up`: this creates a fresh devcluster image based on the current state of your workspace.
+
+You should try to minimize rebuilding vagrant images; it's not horribly slow, but it does take a while.
+
+To test client changes:
+
+   * Make a change in your local workspace, and commit it.
+   * `vagrant ssh` into the devcluster.
+   * `cd aurora`
+   * Pull your changes into the vagrant copy: `git pull /vagrant *branchname*`.
+   * Build the modified client using pants.
+   * Run your command using `aurora2`. (You don't need to do any install; the aurora2 command
+     is a symbolic link to the executable generated by pants.)

Modified: incubator/aurora/site/source/documentation/latest/developing-aurora-scheduler.md
URL: http://svn.apache.org/viewvc/incubator/aurora/site/source/documentation/latest/developing-aurora-scheduler.md?rev=1623604&r1=1623603&r2=1623604&view=diff
==============================================================================
--- incubator/aurora/site/source/documentation/latest/developing-aurora-scheduler.md (original)
+++ incubator/aurora/site/source/documentation/latest/developing-aurora-scheduler.md Tue Sep  9 00:47:19 2014
@@ -20,6 +20,14 @@ tests use
 
     ./gradlew clean build
 
+Running the build with code quality checks
+------------------------------------------
+To speed up development iteration, the plain gradle commands will not run static analysis tools.
+However, you should run these before posting a review diff, and **always** run this before pushing a
+commit to origin/master.
+
+    ./gradlew build -Pq
+
 Creating a bundle for deployment
 --------------------------------
 Gradle can create a zip file containing Aurora, all of its dependencies, and a launch script with

Added: incubator/aurora/site/source/documentation/latest/sla.md
URL: http://svn.apache.org/viewvc/incubator/aurora/site/source/documentation/latest/sla.md?rev=1623604&view=auto
==============================================================================
--- incubator/aurora/site/source/documentation/latest/sla.md (added)
+++ incubator/aurora/site/source/documentation/latest/sla.md Tue Sep  9 00:47:19 2014
@@ -0,0 +1,176 @@
+Aurora SLA Measurement
+--------------
+
+- [Overview](#overview)
+- [Metric Details](#metric-details)
+  - [Platform Uptime](#platform-uptime)
+  - [Job Uptime](#job-uptime)
+  - [Median Time To Assigned (MTTA)](#median-time-to-assigned-\(mtta\))
+  - [Median Time To Running (MTTR)](#median-time-to-running-\(mttr\))
+- [Limitations](#limitations)
+
+## Overview
+
+The primary goal of the feature is collection and monitoring of Aurora job SLA (Service Level
+Agreements) metrics that defining a contractual relationship between the Aurora/Mesos platform
+and hosted services.
+
+The Aurora SLA feature currently supports stat collection only for service (non-cron)
+production jobs (`"production = True"` in your `.aurora` config).
+
+Counters that track SLA measurements are computed periodically within the scheduler.
+The individual instance metrics are refreshed every minute (configurable via
+`sla_stat_refresh_interval`). The instance counters are subsequently aggregated by
+relevant grouping types before exporting to scheduler `/vars` endpoint (when using `vagrant`
+that would be `http://192.168.33.7:8081/vars`)
+
+## Metric Details
+
+### Platform Uptime
+
+*Aggregate amount of time a job spends in a non-runnable state due to platform unavailability
+or scheduling delays. This metric tracks Aurora/Mesos uptime performance and reflects on any
+system-caused downtime events (tasks LOST or DRAINED). Any user-initiated task kills/restarts
+will not degrade this metric.*
+
+**Collection scope:**
+
+* Per job - `sla_<job_key>_platform_uptime_percent`
+* Per cluster - `sla_cluster_platform_uptime_percent`
+
+**Units:** percent
+
+A fault in the task environment may cause the Aurora/Mesos to have different views on the task state
+or lose track of the task existence. In such cases, the service task is marked as LOST and
+rescheduled by Aurora. For example, this may happen when the task stays in ASSIGNED or STARTING
+for too long or the Mesos slave becomes unhealthy (or disappears completely). The time between
+task entering LOST and its replacement reaching RUNNING state is counted towards platform downtime.
+
+Another example of a platform downtime event is the administrator-requested task rescheduling. This
+happens during planned Mesos slave maintenance when all slave tasks are marked as DRAINED and
+rescheduled elsewhere.
+
+To accurately calculate Platform Uptime, we must separate platform incurred downtime from user
+actions that put a service instance in a non-operational state. It is simpler to isolate
+user-incurred downtime and treat all other downtime as platform incurred.
+
+Currently, a user can cause a healthy service (task) downtime in only two ways: via `killTasks`
+or `restartShards` RPCs. For both, their affected tasks leave an audit state transition trail
+relevant to uptime calculations. By applying a special "SLA meaning" to exposed task state
+transition records, we can build a deterministic downtime trace for every given service instance.
+
+A task going through a state transition carries one of three possible SLA meanings
+(see [SlaAlgorithm.java](../src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java) for
+sla-to-task-state mapping):
+
+* Task is UP: starts a period where the task is considered to be up and running from the Aurora
+  platform standpoint.
+
+* Task is DOWN: starts a period where the task cannot reach the UP state for some
+  non-user-related reason. Counts towards instance downtime.
+
+* Task is REMOVED from SLA: starts a period where the task is not expected to be UP due to
+  user initiated action or failure. We ignore this period for the uptime calculation purposes.
+
+This metric is recalculated over the last sampling period (last minute) to account for
+any UP/DOWN/REMOVED events. It ignores any UP/DOWN events not immediately adjacent to the
+sampling interval as well as adjacent REMOVED events.
+
+### Job Uptime
+
+*Percentage of the job instances considered to be in RUNNING state for the specified duration
+relative to request time. This is a purely application side metric that is considering aggregate
+uptime of all RUNNING instances. Any user- or platform initiated restarts directly affect
+this metric.*
+
+**Collection scope:** We currently expose job uptime values at 5 pre-defined
+percentiles (50th,75th,90th,95th and 99th):
+
+* `sla_<job_key>_job_uptime_50_00_sec`
+* `sla_<job_key>_job_uptime_75_00_sec`
+* `sla_<job_key>_job_uptime_90_00_sec`
+* `sla_<job_key>_job_uptime_95_00_sec`
+* `sla_<job_key>_job_uptime_99_00_sec`
+
+**Units:** seconds
+You can also get customized real-time stats from aurora client. See `aurora sla -h` for
+more details.
+
+### Median Time To Assigned (MTTA)
+
+*Median time a job spends waiting for its tasks to be assigned to a host. This is a combined
+metric that helps track the dependency of scheduling performance on the requested resources
+(user scope) as well as the internal scheduler bin-packing algorithm efficiency (platform scope).*
+
+**Collection scope:**
+
+* Per job - `sla_<job_key>_mtta_ms`
+* Per cluster - `sla_cluster_mtta_ms`
+* Per instance size (small, medium, large, x-large, xx-large). Size are defined in:
+[ResourceAggregates.java](../src/main/java/org/apache/aurora/scheduler/base/ResourceAggregates.java)
+  * By CPU:
+    * `sla_cpu_small_mtta_ms`
+    * `sla_cpu_medium_mtta_ms`
+    * `sla_cpu_large_mtta_ms`
+    * `sla_cpu_xlarge_mtta_ms`
+    * `sla_cpu_xxlarge_mtta_ms`
+  * By RAM:
+    * `sla_ram_small_mtta_ms`
+    * `sla_ram_medium_mtta_ms`
+    * `sla_ram_large_mtta_ms`
+    * `sla_ram_xlarge_mtta_ms`
+    * `sla_ram_xxlarge_mtta_ms`
+  * By DISK:
+    * `sla_disk_small_mtta_ms`
+    * `sla_disk_medium_mtta_ms`
+    * `sla_disk_large_mtta_ms`
+    * `sla_disk_xlarge_mtta_ms`
+    * `sla_disk_xxlarge_mtta_ms`
+
+**Units:** milliseconds
+
+MTTA only considers instances that have already reached ASSIGNED state and ignores those
+that are still PENDING. This ensures straggler instances (e.g. with unreasonable resource
+constraints) do not affect metric curves.
+
+### Median Time To Running (MTTR)
+
+*Median time a job waits for its tasks to reach RUNNING state. This is a comprehensive metric
+reflecting on the overall time it takes for the Aurora/Mesos to start executing user content.*
+
+**Collection scope:**
+
+* Per job - `sla_<job_key>_mttr_ms`
+* Per cluster - `sla_cluster_mttr_ms`
+* Per instance size (small, medium, large, x-large, xx-large). Size are defined in:
+[ResourceAggregates.java](../src/main/java/org/apache/aurora/scheduler/base/ResourceAggregates.java)
+  * By CPU:
+    * `sla_cpu_small_mttr_ms`
+    * `sla_cpu_medium_mttr_ms`
+    * `sla_cpu_large_mttr_ms`
+    * `sla_cpu_xlarge_mttr_ms`
+    * `sla_cpu_xxlarge_mttr_ms`
+  * By RAM:
+    * `sla_ram_small_mttr_ms`
+    * `sla_ram_medium_mttr_ms`
+    * `sla_ram_large_mttr_ms`
+    * `sla_ram_xlarge_mttr_ms`
+    * `sla_ram_xxlarge_mttr_ms`
+  * By DISK:
+    * `sla_disk_small_mttr_ms`
+    * `sla_disk_medium_mttr_ms`
+    * `sla_disk_large_mttr_ms`
+    * `sla_disk_xlarge_mttr_ms`
+    * `sla_disk_xxlarge_mttr_ms`
+
+**Units:** milliseconds
+
+MTTR only considers instances in RUNNING state. This ensures straggler instances (e.g. with
+unreasonable resource constraints) do not affect metric curves.
+
+## Limitations
+
+* The availability of Aurora SLA metrics is bound by the scheduler availability.
+
+* All metrics are calculated at a pre-defined interval (currently set at 1 minute).
+  Scheduler restarts may result in missed collections.
\ No newline at end of file

Modified: incubator/aurora/site/source/documentation/latest/user-guide.md
URL: http://svn.apache.org/viewvc/incubator/aurora/site/source/documentation/latest/user-guide.md?rev=1623604&r1=1623603&r2=1623604&view=diff
==============================================================================
--- incubator/aurora/site/source/documentation/latest/user-guide.md (original)
+++ incubator/aurora/site/source/documentation/latest/user-guide.md Tue Sep  9 00:47:19 2014
@@ -1,17 +1,20 @@
 Aurora User Guide
 -----------------
 
-- [Overview](#overview)
-- [Job Lifecycle](#job-lifecycle)
-  - [Life Of A Task](#life-of-a-task)
-  - [PENDING to RUNNING states](#pending-to-running-states)
-  - [Task Updates](#task-updates)
-  - [Giving Priority to Production Tasks: PREEMPTING](#giving-priority-to-production-tasks-preempting)
-  - [Natural Termination: FINISHED, FAILED](#natural-termination-finished-failed)
-  - [Forceful Termination: KILLING, RESTARTING](#forceful-termination-killing-restarting)
-- [Configuration](#configuration)
-- [Creating Jobs](#creating-jobs)
-- [Interacting With Jobs](#interacting-with-jobs)
+- [Overview](#user-content-overview)
+- [Job Lifecycle](#user-content-job-lifecycle)
+	- [Life Of A Task](#user-content-life-of-a-task)
+	- [PENDING to RUNNING states](#user-content-pending-to-running-states)
+	- [Task Updates](#user-content-task-updates)
+	- [HTTP Health Checking and Graceful Shutdown](#user-content-http-health-checking-and-graceful-shutdown)
+		- [Tearing a task down](#user-content-tearing-a-task-down)
+	- [Giving Priority to Production Tasks: PREEMPTING](#user-content-giving-priority-to-production-tasks-preempting)
+	- [Natural Termination: FINISHED, FAILED](#user-content-natural-termination-finished-failed)
+	- [Forceful Termination: KILLING, RESTARTING](#user-content-forceful-termination-killing-restarting)
+- [Service Discovery](#user-content-service-discovery)
+- [Configuration](#user-content-configuration)
+- [Creating Jobs](#user-content-creating-jobs)
+- [Interacting With Jobs](#user-content-interacting-with-jobs)
 
 Overview
 --------
@@ -107,9 +110,6 @@ When Aurora reads a configuration file a
 4.  The scheduler puts the `Task`s into `PENDING` state, starting each
     `Task`'s life cycle.
 
-**Note**: It is not currently possible to create an Aurora job from
-within an Aurora job.
-
 ### Life Of A Task
 
 ![Life of a task](images/lifeofatask.png)
@@ -186,6 +186,50 @@ with old instance configs and batch upda
 from the point where the update failed. E.g.; (0,1,2) (3,4,5) (6,7,
 8-FAIL) results in a rollback in order (8,7,6) (5,4,3) (2,1,0).
 
+### HTTP Health Checking and Graceful Shutdown
+
+The Executor implements a protocol for rudimentary control of a task via HTTP.  Tasks subscribe for
+this protocol by declaring a port named `health`.  Take for example this configuration snippet:
+
+    nginx = Process(
+      name = 'nginx',
+      cmdline = './run_nginx.sh -port {{thermos.ports[http]}}')
+
+When this Process is included in a job, the job will be allocated a port, and the command line
+will be replaced with something like:
+
+    ./run_nginx.sh -port 42816
+
+Where 42816 happens to be the allocated. port.  Typically, the Executor monitors Processes within
+a task only by liveness of the forked process.  However, when a `health` port was allocated, it will
+also send periodic HTTP health checks.  A task requesting a `health` port must handle the following
+requests:
+
+| HTTP request            | Description                             |
+| ------------            | -----------                             |
+| `GET /health`           | Inquires whether the task is healthy.   |
+| `POST /quitquitquit`    | Task should initiate graceful shutdown. |
+| `POST /abortabortabort` | Final warning task is being killed.     |
+
+Please see the
+[configuration reference](configuration-reference.md#user-content-healthcheckconfig-objects) for
+configuration options for this feature.
+
+#### Tearing a task down
+
+The Executor follows an escalation sequence when killing a running task:
+
+  1. If `health` port is not present, skip to (5)
+  2. POST /quitquitquit
+  3. wait 5 seconds
+  4. POST /abortabortabort
+  5. Send SIGTERM (`kill`)
+  6. Send SIGKILL (`kill -9`)
+
+If the Executor notices that all Processes in a Task have aborted during this sequence, it will
+not proceed with subsequent steps.  Note that graceful shutdown is best-effort, and due to the many
+inevitable realities of distributed systems, it may not be performed.
+
 ### Giving Priority to Production Tasks: PREEMPTING
 
 Sometimes a Task needs to be interrupted, such as when a non-production
@@ -233,12 +277,27 @@ Configuration
 
 You define and configure your Jobs (and their Tasks and Processes) in
 Aurora configuration files. Their filenames end with the `.aurora`
-suffix, and you write them in Python making use of the Pystashio
+suffix, and you write them in Python making use of the Pystachio
 templating language, along
 with specific Aurora, Mesos, and Thermos commands and methods. See the
 [Configuration Guide and Reference](/documentation/latest/configuration-reference/) and
 [Configuration Tutorial](/documentation/latest/configuration-tutorial/).
 
+Service Discovery
+-----------------
+
+It is possible for the Aurora executor to announce tasks into ServerSets for
+the purpose of service discovery.  ServerSets use the Zookeeper [group membership pattern](http://zookeeper.apache.org/doc/trunk/recipes.html#sc_outOfTheBox)
+of which there are several reference implementations:
+
+  - [C++](https://github.com/apache/mesos/blob/master/src/zookeeper/group.cpp)
+  - [Java](https://github.com/twitter/commons/blob/master/src/java/com/twitter/common/zookeeper/ServerSetImpl.java#L221)
+  - [Python](https://github.com/twitter/commons/blob/master/src/python/twitter/common/zookeeper/serverset/serverset.py#L51)
+
+These can also be used natively in Finagle using the [ZookeeperServerSetCluster](https://github.com/twitter/finagle/blob/master/finagle-serversets/src/main/scala/com/twitter/finagle/zookeeper/ZookeeperServerSetCluster.scala).
+
+For more information about how to configure announcing, see the [Configuration Reference](/documentation/latest/configuration-reference/).
+
 Creating Jobs
 -------------
 

Modified: incubator/aurora/site/source/documentation/latest/vagrant.md
URL: http://svn.apache.org/viewvc/incubator/aurora/site/source/documentation/latest/vagrant.md?rev=1623604&r1=1623603&r2=1623604&view=diff
==============================================================================
--- incubator/aurora/site/source/documentation/latest/vagrant.md (original)
+++ incubator/aurora/site/source/documentation/latest/vagrant.md Tue Sep  9 00:47:19 2014
@@ -1,18 +1,51 @@
-Aurora includes a `Vagrantfile` that defines a full Mesos cluster running Aurora. You can use it to
-explore Aurora's various components. To get started, install
-[VirtualBox](https://www.virtualbox.org/) and [Vagrant](http://www.vagrantup.com/),
-then run `vagrant up` somewhere in the repository source tree to create a team of VMs.  This may take some time initially as it builds all
-the components involved in running an aurora cluster.
-
-The scheduler is listening on http://192.168.33.7:8081/scheduler
-The observer is listening on http://192.168.33.7:1338
-The master is listening on http://192.168.33.7:5050
+Getting Started
+===============
+To replicate a real cluster environment as closely as possible, we use
+[Vagrant](http://www.vagrantup.com/) to launch a complete Aurora cluster in a virtual machine.
+
+Prerequisites
+-------------
+  * [VirtualBox](https://www.virtualbox.org/)
+  * [Vagrant](http://www.vagrantup.com/)
+  * A clone of the Aurora repository, or source distribution.
+
+You can start a local cluster by running:
+
+    vagrant up
+
+Once started, several services should be running:
+
+  * scheduler is listening on http://192.168.33.7:8081
+  * observer is listening on http://192.168.33.7:1338
+  * master is listening on http://192.168.33.7:5050
+  * slave is listening on http://192.168.33.7:5051
+
+You can SSH into the machine with `vagrant ssh` and execute aurora client commands using the
+`aurora` command.  A pre-installed `clusters.json` file refers to your local cluster as
+`devcluster`, which you will use in client commands.
+
+Deleting your local cluster
+===========================
+Once you are finished with your local cluster, or if you would otherwise like to start from scratch,
+you can use the command `vagrant destroy` to turn off and delete the virtual file system.
+
+
+Rebuilding components
+=====================
+If you are changing Aurora code and would like to rebuild a component, you can use the `aurorabuild`
+command on your vagrant machine to build and restart a component.  This is considerably faster than
+destroying and rebuilding your VM.
+
+`aurorabuild` accepts a list of components to build and update.  You may invoke the command with
+no arguments to get a list of supported components.
+
+     vagrant ssh -c 'aurorabuild client'
 
-Once everything is up, you can `vagrant ssh devcluster` and execute aurora client commands using the `aurora` client.
 
 Troubleshooting
----------------
+===============
 Most of the vagrant related problems can be fixed by the following steps:
 * Destroying the vagrant environment with `vagrant destroy`
+* Killing any orphaned VMs (see AURORA-499) with `virtualbox` UI or `VBoxManage` command line tool
 * Cleaning the repository of build artifacts and other intermediate output with `git clean -fdx`
 * Bringing up the vagrant environment with `vagrant up`

Added: incubator/aurora/site/source/layouts/documentation.erb
URL: http://svn.apache.org/viewvc/incubator/aurora/site/source/layouts/documentation.erb?rev=1623604&view=auto
==============================================================================
--- incubator/aurora/site/source/layouts/documentation.erb (added)
+++ incubator/aurora/site/source/layouts/documentation.erb Tue Sep  9 00:47:19 2014
@@ -0,0 +1,10 @@
+<% content_for :page_title do %>
+Documentation
+<% end %>
+<% wrap_layout :layout do %>
+<div class="row-fluid">
+	<div class="col-md-12">
+		<%= yield %>
+	</div>
+</div>
+<% end %>
\ No newline at end of file



Mime
View raw message