aurora-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jfarr...@apache.org
Subject svn commit: r1739360 [6/8] - in /aurora/site: ./ data/ publish/ publish/blog/ publish/blog/aurora-0-13-0-released/ publish/documentation/0.10.0/ publish/documentation/0.10.0/build-system/ publish/documentation/0.10.0/client-cluster-configuration/ publi...
Date Fri, 15 Apr 2016 20:21:35 GMT
Added: aurora/site/source/documentation/0.13.0/getting-started/overview.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/getting-started/overview.md?rev=1739360&view=auto
==============================================================================
--- aurora/site/source/documentation/0.13.0/getting-started/overview.md (added)
+++ aurora/site/source/documentation/0.13.0/getting-started/overview.md Fri Apr 15 20:21:30 2016
@@ -0,0 +1,110 @@
+Aurora System Overview
+======================
+
+Apache Aurora is a service scheduler that runs on top of Apache Mesos, enabling you to run
+long-running services, cron jobs, and ad-hoc jobs that take advantage of Apache Mesos' scalability,
+fault-tolerance, and resource isolation.
+
+
+Components
+----------
+
+It is important to have an understanding of the components that make up
+a functioning Aurora cluster.
+
+![Aurora Components](../images/components.png)
+
+* **Aurora scheduler**
+  The scheduler is your primary interface to the work you run in your cluster.  You will
+  instruct it to run jobs, and it will manage them in Mesos for you.  You will also frequently use
+  the scheduler's read-only web interface as a heads-up display for what's running in your cluster.
+
+* **Aurora client**
+  The client (`aurora` command) is a command line tool that exposes primitives that you can use to
+  interact with the scheduler. The client operates on
+
+  Aurora also provides an admin client (`aurora_admin` command) that contains commands built for
+  cluster administrators.  You can use this tool to do things like manage user quotas and manage
+  graceful maintenance on machines in cluster.
+
+* **Aurora executor**
+  The executor (a.k.a. Thermos executor) is responsible for carrying out the workloads described in
+  the Aurora DSL (`.aurora` files).  The executor is what actually executes user processes.  It will
+  also perform health checking of tasks and register tasks in ZooKeeper for the purposes of dynamic
+  service discovery.
+
+* **Aurora observer**
+  The observer provides browser-based access to the status of individual tasks executing on worker
+  machines.  It gives insight into the processes executing, and facilitates browsing of task sandbox
+  directories.
+
+* **ZooKeeper**
+  [ZooKeeper](http://zookeeper.apache.org) is a distributed consensus system.  In an Aurora cluster
+  it is used for reliable election of the leading Aurora scheduler and Mesos master.  It is also
+  used as a vehicle for service discovery, see [Service Discovery](../features/service-discovery.md)
+
+* **Mesos master**
+  The master is responsible for tracking worker machines and performing accounting of their
+  resources.  The scheduler interfaces with the master to control the cluster.
+
+* **Mesos agent**
+  The agent receives work assigned by the scheduler and executes them.  It interfaces with Linux
+  isolation systems like cgroups, namespaces and Docker to manage the resource consumption of tasks.
+  When a user task is launched, the agent will launch the executor (in the context of a Linux cgroup
+  or Docker container depending upon the environment), which will in turn fork user processes.
+
+
+Jobs, Tasks and Processes
+--------------------------
+
+Aurora is a Mesos framework used to schedule *jobs* onto Mesos. Mesos
+cares about individual *tasks*, but typical jobs consist of dozens or
+hundreds of task replicas. Aurora provides a layer on top of Mesos with
+its `Job` abstraction. An Aurora `Job` consists of a task template and
+instructions for creating near-identical replicas of that task (modulo
+things like "instance id" or specific port numbers which may differ from
+machine to machine).
+
+How many tasks make up a Job is complicated. On a basic level, a Job consists of
+one task template and instructions for creating near-identical replicas of that task
+(otherwise referred to as "instances" or "shards").
+
+A task can merely be a single *process* corresponding to a single
+command line, such as `python2.7 my_script.py`. However, a task can also
+consist of many separate processes, which all run within a single
+sandbox. For example, running multiple cooperating agents together,
+such as `logrotate`, `installer`, master, or slave processes. This is
+where Thermos comes in. While Aurora provides a `Job` abstraction on
+top of Mesos `Tasks`, Thermos provides a `Process` abstraction
+underneath Mesos `Task`s and serves as part of the Aurora framework's
+executor.
+
+You define `Job`s,` Task`s, and `Process`es in a configuration file.
+Configuration files are written in Python, and make use of the
+[Pystachio](https://github.com/wickman/pystachio) templating language,
+along with specific Aurora, Mesos, and Thermos commands and methods.
+The configuration files typically end with a `.aurora` extension.
+
+Summary:
+
+* Aurora manages jobs made of tasks.
+* Mesos manages tasks made of processes.
+* Thermos manages processes.
+* All that is defined in `.aurora` configuration files
+
+![Aurora hierarchy](../images/aurora_hierarchy.png)
+
+Each `Task` has a *sandbox* created when the `Task` starts and garbage
+collected when it finishes. All of a `Task'`s processes run in its
+sandbox, so processes can share state by using a shared current working
+directory.
+
+The sandbox garbage collection policy considers many factors, most
+importantly age and size. It makes a best-effort attempt to keep
+sandboxes around as long as possible post-task in order for service
+owners to inspect data and logs, should the `Task` have completed
+abnormally. But you can't design your applications assuming sandboxes
+will be around forever, e.g. by building log saving or other
+checkpointing mechanisms directly into your application or into your
+`Job` description.
+

Added: aurora/site/source/documentation/0.13.0/getting-started/tutorial.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/getting-started/tutorial.md?rev=1739360&view=auto
==============================================================================
--- aurora/site/source/documentation/0.13.0/getting-started/tutorial.md (added)
+++ aurora/site/source/documentation/0.13.0/getting-started/tutorial.md Fri Apr 15 20:21:30 2016
@@ -0,0 +1,258 @@
+# Aurora Tutorial
+
+This tutorial shows how to use the Aurora scheduler to run (and "`printf-debug`")
+a hello world program on Mesos. This is the recommended document for new Aurora users
+to start getting up to speed on the system.
+
+- [Prerequisite](#setup-install-aurora)
+- [The Script](#the-script)
+- [Aurora Configuration](#aurora-configuration)
+- [Creating the Job](#creating-the-job)
+- [Watching the Job Run](#watching-the-job-run)
+- [Cleanup](#cleanup)
+- [Next Steps](#next-steps)
+
+
+## Prerequisite
+
+This tutorial assumes you are running [Aurora locally using Vagrant](vagrant.md).
+However, in general the instructions are also applicable to any other
+[Aurora installation](../operations/installation.md).
+
+Unless otherwise stated, all commands are to be run from the root of the aurora
+repository clone.
+
+
+## The Script
+
+Our "hello world" application is a simple Python script that loops
+forever, displaying the time every few seconds. Copy the code below and
+put it in a file named `hello_world.py` in the root of your Aurora repository clone
+(Note: this directory is the same as `/vagrant` inside the Vagrant VMs).
+
+The script has an intentional bug, which we will explain later on.
+
+<!-- NOTE: If you are changing this file, be sure to also update examples/vagrant/test_tutorial.sh.
+-->
+```python
+import time
+
+def main():
+  SLEEP_DELAY = 10
+  # Python experts - ignore this blatant bug.
+  for i in xrang(100):
+    print("Hello world! The time is now: %s. Sleeping for %d secs" % (
+      time.asctime(), SLEEP_DELAY))
+    time.sleep(SLEEP_DELAY)
+
+if __name__ == "__main__":
+  main()
+```
+
+## Aurora Configuration
+
+Once we have our script/program, we need to create a *configuration
+file* that tells Aurora how to manage and launch our Job. Save the below
+code in the file `hello_world.aurora`.
+
+<!-- NOTE: If you are changing this file, be sure to also update examples/vagrant/test_tutorial.sh.
+-->
+```python
+pkg_path = '/vagrant/hello_world.py'
+
+# we use a trick here to make the configuration change with
+# the contents of the file, for simplicity.  in a normal setting, packages would be
+# versioned, and the version number would be changed in the configuration.
+import hashlib
+with open(pkg_path, 'rb') as f:
+  pkg_checksum = hashlib.md5(f.read()).hexdigest()
+
+# copy hello_world.py into the local sandbox
+install = Process(
+  name = 'fetch_package',
+  cmdline = 'cp %s . && echo %s && chmod +x hello_world.py' % (pkg_path, pkg_checksum))
+
+# run the script
+hello_world = Process(
+  name = 'hello_world',
+  cmdline = 'python -u hello_world.py')
+
+# describe the task
+hello_world_task = SequentialTask(
+  processes = [install, hello_world],
+  resources = Resources(cpu = 1, ram = 1*MB, disk=8*MB))
+
+jobs = [
+  Service(cluster = 'devcluster',
+          environment = 'devel',
+          role = 'www-data',
+          name = 'hello_world',
+          task = hello_world_task)
+]
+```
+
+There is a lot going on in that configuration file:
+
+1. From a "big picture" viewpoint, it first defines two
+Processes. Then it defines a Task that runs the two Processes in the
+order specified in the Task definition, as well as specifying what
+computational and memory resources are available for them.  Finally,
+it defines a Job that will schedule the Task on available and suitable
+machines. This Job is the sole member of a list of Jobs; you can
+specify more than one Job in a config file.
+
+2. At the Process level, it specifies how to get your code into the
+local sandbox in which it will run. It then specifies how the code is
+actually run once the second Process starts.
+
+For more about Aurora configuration files, see the [Configuration
+Tutorial](../reference/configuration-tutorial.md) and the [Configuration
+Reference](../reference/configuration.md) (preferably after finishing this
+tutorial).
+
+
+## Creating the Job
+
+We're ready to launch our job! To do so, we use the Aurora Client to
+issue a Job creation request to the Aurora scheduler.
+
+Many Aurora Client commands take a *job key* argument, which uniquely
+identifies a Job. A job key consists of four parts, each separated by a
+"/". The four parts are  `<cluster>/<role>/<environment>/<jobname>`
+in that order:
+
+* Cluster refers to the name of a particular Aurora installation.
+* Role names are user accounts existing on the slave machines. If you
+don't know what accounts are available, contact your sysadmin.
+* Environment names are namespaces; you can count on `test`, `devel`,
+`staging` and `prod` existing.
+* Jobname is the custom name of your job.
+
+When comparing two job keys, if any of the four parts is different from
+its counterpart in the other key, then the two job keys identify two separate
+jobs. If all four values are identical, the job keys identify the same job.
+
+The `clusters.json` [client configuration](../reference/client-cluster-configuration.md)
+for the Aurora scheduler defines the available cluster names.
+For Vagrant, from the top-level of your Aurora repository clone, do:
+
+    $ vagrant ssh
+
+Followed by:
+
+    vagrant@aurora:~$ cat /etc/aurora/clusters.json
+
+You'll see something like the following. The `name` value shown here, corresponds to a job key's cluster value.
+
+```javascript
+[{
+  "name": "devcluster",
+  "zk": "192.168.33.7",
+  "scheduler_zk_path": "/aurora/scheduler",
+  "auth_mechanism": "UNAUTHENTICATED",
+  "slave_run_directory": "latest",
+  "slave_root": "/var/lib/mesos"
+}]
+```
+
+The Aurora Client command that actually runs our Job is `aurora job create`. It creates a Job as
+specified by its job key and configuration file arguments and runs it.
+
+    aurora job create <cluster>/<role>/<environment>/<jobname> <config_file>
+
+Or for our example:
+
+    aurora job create devcluster/www-data/devel/hello_world /vagrant/hello_world.aurora
+
+After entering our virtual machine using `vagrant ssh`, this returns:
+
+    vagrant@aurora:~$ aurora job create devcluster/www-data/devel/hello_world /vagrant/hello_world.aurora
+     INFO] Creating job hello_world
+     INFO] Checking status of devcluster/www-data/devel/hello_world
+    Job create succeeded: job url=http://aurora.local:8081/scheduler/www-data/devel/hello_world
+
+
+## Watching the Job Run
+
+Now that our job is running, let's see what it's doing. Access the
+scheduler web interface at `http://$scheduler_hostname:$scheduler_port/scheduler`
+Or when using `vagrant`, `http://192.168.33.7:8081/scheduler`
+First we see what Jobs are scheduled:
+
+![Scheduled Jobs](../images/ScheduledJobs.png)
+
+Click on your user name, which in this case was `www-data`, and we see the Jobs associated
+with that role:
+
+![Role Jobs](../images/RoleJobs.png)
+
+If you click on your `hello_world` Job, you'll see:
+
+![hello_world Job](../images/HelloWorldJob.png)
+
+Oops, looks like our first job didn't quite work! The task is temporarily throttled for
+having failed on every attempt of the Aurora scheduler to run it. We have to figure out
+what is going wrong.
+
+On the Completed tasks tab, we see all past attempts of the Aurora scheduler to run our job.
+
+![Completed tasks tab](../images/CompletedTasks.png)
+
+We can navigate to the Task page of a failed run by clicking on the host link.
+
+![Task page](../images/TaskBreakdown.png)
+
+Once there, we see that the `hello_world` process failed. The Task page
+captures the standard error and standard output streams and makes them available.
+Clicking through to `stderr` on the failed `hello_world` process, we see what happened.
+
+![stderr page](../images/stderr.png)
+
+It looks like we made a typo in our Python script. We wanted `xrange`,
+not `xrang`. Edit the `hello_world.py` script to use the correct function
+and save it as `hello_world_v2.py`. Then update the `hello_world.aurora`
+configuration to the newest version.
+
+In order to try again, we can now instruct the scheduler to update our job:
+
+    vagrant@aurora:~$ aurora update start devcluster/www-data/devel/hello_world /vagrant/hello_world.aurora
+     INFO] Starting update for: hello_world
+    Job update has started. View your update progress at http://aurora.local:8081/scheduler/www-data/devel/hello_world/update/8ef38017-e60f-400d-a2f2-b5a8b724e95b
+
+This time, the task comes up.
+
+![Running Job](../images/RunningJob.png)
+
+By again clicking on the host, we inspect the Task page, and see that the
+`hello_world` process is running.
+
+![Running Task page](../images/runningtask.png)
+
+We then inspect the output by clicking on `stdout` and see our process'
+output:
+
+![stdout page](../images/stdout.png)
+
+## Cleanup
+
+Now that we're done, we kill the job using the Aurora client:
+
+    vagrant@aurora:~$ aurora job killall devcluster/www-data/devel/hello_world
+     INFO] Killing tasks for job: devcluster/www-data/devel/hello_world
+     INFO] Instances to be killed: [0]
+    Successfully killed instances [0]
+    Job killall succeeded
+
+The job page now shows the `hello_world` tasks as completed.
+
+![Killed Task page](../images/killedtask.png)
+
+## Next Steps
+
+Now that you've finished this Tutorial, you should read or do the following:
+
+- [The Aurora Configuration Tutorial](../reference/configuration-tutorial.md), which provides more examples
+  and best practices for writing Aurora configurations. You should also look at
+  the [Aurora Configuration Reference](../reference/configuration.md).
+- Explore the Aurora Client - use `aurora -h`, and read the
+  [Aurora Client Commands](../reference/client-commands.md) document.

Added: aurora/site/source/documentation/0.13.0/getting-started/vagrant.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/getting-started/vagrant.md?rev=1739360&view=auto
==============================================================================
--- aurora/site/source/documentation/0.13.0/getting-started/vagrant.md (added)
+++ aurora/site/source/documentation/0.13.0/getting-started/vagrant.md Fri Apr 15 20:21:30 2016
@@ -0,0 +1,137 @@
+A local Cluster with Vagrant
+============================
+
+This document shows you how to configure a complete cluster using a virtual machine. This setup
+replicates a real cluster in your development machine as closely as possible. After you complete
+the steps outlined here, you will be ready to create and run your first Aurora job.
+
+The following sections describe these steps in detail:
+
+1. [Overview](#user-content-overview)
+1. [Install VirtualBox and Vagrant](#user-content-install-virtualbox-and-vagrant)
+1. [Clone the Aurora repository](#user-content-clone-the-aurora-repository)
+1. [Start the local cluster](#user-content-start-the-local-cluster)
+1. [Log onto the VM](#user-content-log-onto-the-vm)
+1. [Run your first job](#user-content-run-your-first-job)
+1. [Rebuild components](#user-content-rebuild-components)
+1. [Shut down or delete your local cluster](#user-content-shut-down-or-delete-your-local-cluster)
+1. [Troubleshooting](#user-content-troubleshooting)
+
+
+Overview
+--------
+
+The Aurora distribution includes a set of scripts that enable you to create a local cluster in
+your development machine. These scripts use [Vagrant](https://www.vagrantup.com/) and
+[VirtualBox](https://www.virtualbox.org/) to run and configure a virtual machine. Once the
+virtual machine is running, the scripts install and initialize Aurora and any required components
+to create the local cluster.
+
+
+Install VirtualBox and Vagrant
+------------------------------
+
+First, download and install [VirtualBox](https://www.virtualbox.org/) on your development machine.
+
+Then download and install [Vagrant](https://www.vagrantup.com/). To verify that the installation
+was successful, open a terminal window and type the `vagrant` command. You should see a list of
+common commands for this tool.
+
+
+Clone the Aurora repository
+---------------------------
+
+To obtain the Aurora source distribution, clone its Git repository using the following command:
+
+     git clone git://git.apache.org/aurora.git
+
+
+Start the local cluster
+-----------------------
+
+Now change into the `aurora/` directory, which contains the Aurora source code and
+other scripts and tools:
+
+     cd aurora/
+
+To start the local cluster, type the following command:
+
+     vagrant up
+
+This command uses the configuration scripts in the Aurora distribution to:
+
+* Download a Linux system image.
+* Start a virtual machine (VM) and configure it.
+* Install the required build tools on the VM.
+* Install Aurora's requirements (like [Mesos](http://mesos.apache.org/) and
+[Zookeeper](http://zookeeper.apache.org/)) on the VM.
+* Build and install Aurora from source on the VM.
+* Start Aurora's services on the VM.
+
+This process takes several minutes to complete.
+
+To verify that Aurora is running on the cluster, visit the following URLs:
+
+* Scheduler - http://192.168.33.7:8081
+* Observer - http://192.168.33.7:1338
+* Mesos Master - http://192.168.33.7:5050
+* Mesos Slave - http://192.168.33.7:5051
+
+
+Log onto the VM
+---------------
+
+To SSH into the VM, run the following command in your development machine:
+
+     vagrant ssh
+
+To verify that Aurora is installed in the VM, type the `aurora` command. You should see a list
+of arguments and possible commands.
+
+The `/vagrant` directory on the VM is mapped to the `aurora/` local directory
+from which you started the cluster. You can edit files inside this directory in your development
+machine and access them from the VM under `/vagrant`.
+
+A pre-installed `clusters.json` file refers to your local cluster as `devcluster`, which you
+will use in client commands.
+
+
+Run your first job
+------------------
+
+Now that your cluster is up and running, you are ready to define and run your first job in Aurora.
+For more information, see the [Aurora Tutorial](tutorial.md).
+
+
+Rebuild components
+------------------
+
+If you are changing Aurora code and would like to rebuild a component, you can use the `aurorabuild`
+command on the VM to build and restart a component.  This is considerably faster than destroying
+and rebuilding your VM.
+
+`aurorabuild` accepts a list of components to build and update. To get a list of supported
+components, invoke the `aurorabuild` command with no arguments:
+
+     vagrant ssh -c 'aurorabuild client'
+
+
+Shut down or delete your local cluster
+--------------------------------------
+
+To shut down your local cluster, run the `vagrant halt` command in your development machine. To
+start it again, run the `vagrant up` command.
+
+Once you are finished with your local cluster, or if you would otherwise like to start from scratch,
+you can use the command `vagrant destroy` to turn off and delete the virtual file system.
+
+
+Troubleshooting
+---------------
+
+Most of the vagrant related problems can be fixed by the following steps:
+
+* Destroying the vagrant environment with `vagrant destroy`
+* Killing any orphaned VMs (see AURORA-499) with `virtualbox` UI or `VBoxManage` command line tool
+* Cleaning the repository of build artifacts and other intermediate output with `git clean -fdx`
+* Bringing up the vagrant environment with `vagrant up`

Added: aurora/site/source/documentation/0.13.0/images/CPUavailability.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/CPUavailability.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/CPUavailability.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/CompletedTasks.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/CompletedTasks.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/CompletedTasks.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/HelloWorldJob.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/HelloWorldJob.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/HelloWorldJob.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/RoleJobs.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/RoleJobs.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/RoleJobs.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/RunningJob.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/RunningJob.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/RunningJob.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/ScheduledJobs.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/ScheduledJobs.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/ScheduledJobs.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/TaskBreakdown.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/TaskBreakdown.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/TaskBreakdown.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/aurora_hierarchy.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/aurora_hierarchy.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/aurora_hierarchy.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/aurora_logo.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/aurora_logo.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/aurora_logo.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/components.odg
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/components.odg?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/components.odg
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/components.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/components.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/components.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/debug-client-test.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/debug-client-test.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/debug-client-test.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/debugging-client-test.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/debugging-client-test.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/debugging-client-test.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/killedtask.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/killedtask.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/killedtask.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/lifeofatask.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/lifeofatask.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/lifeofatask.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/presentations/02_19_2015_aurora_adopters_panel_thumb.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/presentations/02_19_2015_aurora_adopters_panel_thumb.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/presentations/02_19_2015_aurora_adopters_panel_thumb.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/presentations/02_19_2015_aurora_at_tellapart_thumb.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/presentations/02_19_2015_aurora_at_tellapart_thumb.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/presentations/02_19_2015_aurora_at_tellapart_thumb.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/presentations/02_19_2015_aurora_at_twitter_thumb.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/presentations/02_19_2015_aurora_at_twitter_thumb.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/presentations/02_19_2015_aurora_at_twitter_thumb.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/presentations/02_28_2015_apache_aurora_thumb.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/presentations/02_28_2015_apache_aurora_thumb.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/presentations/02_28_2015_apache_aurora_thumb.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/presentations/03_07_2015_aurora_mesos_in_practice_at_twitter_thumb.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/presentations/03_07_2015_aurora_mesos_in_practice_at_twitter_thumb.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/presentations/03_07_2015_aurora_mesos_in_practice_at_twitter_thumb.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/presentations/03_25_2014_introduction_to_aurora_thumb.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/presentations/03_25_2014_introduction_to_aurora_thumb.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/presentations/03_25_2014_introduction_to_aurora_thumb.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/presentations/04_30_2015_monolith_to_microservices_thumb.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/presentations/04_30_2015_monolith_to_microservices_thumb.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/presentations/04_30_2015_monolith_to_microservices_thumb.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/presentations/08_21_2014_past_present_future_thumb.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/presentations/08_21_2014_past_present_future_thumb.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/presentations/08_21_2014_past_present_future_thumb.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/presentations/09_20_2015_shipping_code_with_aurora_thumb.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/presentations/09_20_2015_shipping_code_with_aurora_thumb.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/presentations/09_20_2015_shipping_code_with_aurora_thumb.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/presentations/09_20_2015_twitter_production_scale_thumb.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/presentations/09_20_2015_twitter_production_scale_thumb.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/presentations/09_20_2015_twitter_production_scale_thumb.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/presentations/10_08_2015_mesos_aurora_on_a_small_scale_thumb.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/presentations/10_08_2015_mesos_aurora_on_a_small_scale_thumb.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/presentations/10_08_2015_mesos_aurora_on_a_small_scale_thumb.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/presentations/10_08_2015_sla_aware_maintenance_for_operators_thumb.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/presentations/10_08_2015_sla_aware_maintenance_for_operators_thumb.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/presentations/10_08_2015_sla_aware_maintenance_for_operators_thumb.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/runningtask.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/runningtask.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/runningtask.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/stderr.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/stderr.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/stderr.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/stdout.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/stdout.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/stdout.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/images/storage_hierarchy.png
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/images/storage_hierarchy.png?rev=1739360&view=auto
==============================================================================
Binary file - no diff available.

Propchange: aurora/site/source/documentation/0.13.0/images/storage_hierarchy.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: aurora/site/source/documentation/0.13.0/index.html.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/index.html.md?rev=1739360&view=auto
==============================================================================
--- aurora/site/source/documentation/0.13.0/index.html.md (added)
+++ aurora/site/source/documentation/0.13.0/index.html.md Fri Apr 15 20:21:30 2016
@@ -0,0 +1,73 @@
+## Introduction
+
+Apache Aurora is a service scheduler that runs on top of Apache Mesos, enabling you to run
+long-running services, cron jobs, and ad-hoc jobs that take advantage of Apache Mesos' scalability,
+fault-tolerance, and resource isolation.
+
+We encourage you to ask questions on the [Aurora user list](http://aurora.apache.org/community/) or
+the `#aurora` IRC channel on `irc.freenode.net`.
+
+
+## Getting Started
+Information for everyone new to Apache Aurora.
+
+ * [Aurora System Overview](getting-started/overview.md)
+ * [Hello World Tutorial](getting-started/tutorial.md)
+ * [Local cluster with Vagrant](getting-started/vagrant.md)
+
+## Features
+Description of important Aurora features.
+
+ * [Containers](features/containers.md)
+ * [Cron Jobs](features/cron-jobs.md)
+ * [Job Updates](features/job-updates.md)
+ * [Multitenancy](features/multitenancy.md)
+ * [Resource Isolation](features/resource-isolation.md)
+ * [Scheduling Constraints](features/constraints.md)
+ * [Services](features/services.md)
+ * [Service Discovery](features/service-discovery.md)
+ * [SLA Metrics](features/sla-metrics.md)
+
+## Operators
+For those that wish to manage and fine-tune an Aurora cluster.
+
+ * [Installation](operations/installation.md)
+ * [Configuration](operations/configuration.md)
+ * [Monitoring](operations/monitoring.md)
+ * [Security](operations/security.md)
+ * [Storage](operations/storage.md)
+ * [Backup](operations/backup-restore.md)
+
+## Reference
+The complete reference of commands, configuration options, and scheduler internals.
+
+ * [Task lifecycle](reference/task-lifecycle.md)
+ * Configuration (`.aurora` files)
+    - [Configuration Reference](reference/configuration.md)
+    - [Configuration Tutorial](reference/configuration-tutorial.md)
+    - [Configuration Best Practices](reference/configuration-best-practices.md)
+    - [Configuration Templating](reference/configuration-templating.md)
+ * Aurora Client
+    - [Client Commands](reference/client-commands.md)
+    - [Client Hooks](reference/client-hooks.md)
+    - [Client Cluster Configuration](reference/client-cluster-configuration.md)
+ * [Scheduler Configuration](reference/scheduler-configuration.md)
+
+## Additional Resources
+ * [Tools integrating with Aurora](additional-resources/tools.md)
+ * [Presentation videos and slides](additional-resources/presentations.md)
+
+## Developers
+All the information you need to start modifying Aurora and contributing back to the project.
+
+ * [Contributing to the project](contributing/)
+ * [Committer's Guide](development/committers-guide.md)
+ * [Design Documents](development/design-documents.md)
+ * Developing the Aurora components:
+     - [Client](development/client.md)
+     - [Scheduler](development/scheduler.md)
+     - [Scheduler UI](development/ui.md)
+     - [Thermos](development/thermos.md)
+     - [Thrift structures](development/thrift.md)
+
+

Added: aurora/site/source/documentation/0.13.0/operations/backup-restore.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/operations/backup-restore.md?rev=1739360&view=auto
==============================================================================
--- aurora/site/source/documentation/0.13.0/operations/backup-restore.md (added)
+++ aurora/site/source/documentation/0.13.0/operations/backup-restore.md Fri Apr 15 20:21:30 2016
@@ -0,0 +1,91 @@
+# Recovering from a Scheduler Backup
+
+**Be sure to read the entire page before attempting to restore from a backup, as it may have
+unintended consequences.**
+
+# Summary
+
+The restoration procedure replaces the existing (possibly corrupted) Mesos replicated log with an
+earlier, backed up, version and requires all schedulers to be taken down temporarily while
+restoring. Once completed, the scheduler state resets to what it was when the backup was created.
+This means any jobs/tasks created or updated after the backup are unknown to the scheduler and will
+be killed shortly after the cluster restarts. All other tasks continue operating as normal.
+
+Usually, it is a bad idea to restore a backup that is not extremely recent (i.e. older than a few
+hours). This is because the scheduler will expect the cluster to look exactly as the backup does,
+so any tasks that have been rescheduled since the backup was taken will be killed.
+
+Instructions below have been verified in [Vagrant environment](../getting-started/vagrant.md) and with minor
+syntax/path changes should be applicable to any Aurora cluster.
+
+# Preparation
+
+Follow these steps to prepare the cluster for restoring from a backup:
+
+* Stop all scheduler instances
+
+* Consider blocking external traffic on a port defined in `-http_port` for all schedulers to
+prevent users from interacting with the scheduler during the restoration process. This will help
+troubleshooting by reducing the scheduler log noise and prevent users from making changes that will
+be erased after the backup snapshot is restored.
+
+* Configure `aurora_admin` access to run all commands listed in
+  [Restore from backup](#restore-from-backup) section locally on the leading scheduler:
+  * Make sure the [clusters.json](../reference/client-cluster-configuration.md) file configured to
+    access scheduler directly. Set `scheduler_uri` setting and remove `zk`. Since leader can get
+    re-elected during the restore steps, consider doing it on all scheduler replicas.
+  * Depending on your particular security approach you will need to either turn off scheduler
+    authorization by removing scheduler `-http_authentication_mechanism` flag or make sure the
+    direct scheduler access is properly authorized. E.g.: in case of Kerberos you will need to make
+    a `/etc/hosts` file change to match your local IP to the scheduler URL configured in keytabs:
+
+        <local_ip> <scheduler_domain_in_keytabs>
+
+* Next steps are required to put scheduler into a partially disabled state where it would still be
+able to accept storage recovery requests but unable to schedule or change task states. This may be
+accomplished by updating the following scheduler configuration options:
+  * Set `-mesos_master_address` to a non-existent zk address. This will prevent scheduler from
+    registering with Mesos. E.g.: `-mesos_master_address=zk://localhost:1111/mesos/master`
+  * `-max_registration_delay` - set to sufficiently long interval to prevent registration timeout
+    and as a result scheduler suicide. E.g: `-max_registration_delay=360mins`
+  * Make sure `-reconciliation_initial_delay` option is set high enough (e.g.: `365days`) to
+    prevent accidental task GC. This is important as scheduler will attempt to reconcile the cluster
+    state and will kill all tasks when restarted with an empty Mesos replicated log.
+
+* Restart all schedulers
+
+# Cleanup and re-initialize Mesos replicated log
+
+Get rid of the corrupted files and re-initialize Mesos replicated log:
+
+* Stop schedulers
+* Delete all files under `-native_log_file_path` on all schedulers
+* Initialize Mesos replica's log file: `sudo mesos-log initialize --path=<-native_log_file_path>`
+* Start schedulers
+
+# Restore from backup
+
+At this point the scheduler is ready to rehydrate from the backup:
+
+* Identify the leading scheduler by:
+  * examining the `scheduler_lifecycle_LEADER_AWAITING_REGISTRATION` metric at the scheduler
+    `/vars` endpoint. Leader will have 1. All other replicas - 0.
+  * examining scheduler logs
+  * or examining Zookeeper registration under the path defined by `-zk_endpoints`
+    and `-serverset_path`
+
+* Locate the desired backup file, copy it to the leading scheduler's `-backup_dir` folder and stage
+recovery by running the following command on a leader
+`aurora_admin scheduler_stage_recovery --bypass-leader-redirect <cluster> scheduler-backup-<yyyy-MM-dd-HH-mm>`
+
+* At this point, the recovery snapshot is staged and available for manual verification/modification
+via `aurora_admin scheduler_print_recovery_tasks --bypass-leader-redirect` and
+`scheduler_delete_recovery_tasks --bypass-leader-redirect` commands.
+See `aurora_admin help <command>` for usage details.
+
+* Commit recovery. This instructs the scheduler to overwrite the existing Mesos replicated log with
+the provided backup snapshot and initiate a mandatory failover
+`aurora_admin scheduler_commit_recovery --bypass-leader-redirect  <cluster>`
+
+# Cleanup
+Undo any modification done during [Preparation](#preparation) sequence.

Added: aurora/site/source/documentation/0.13.0/operations/configuration.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/operations/configuration.md?rev=1739360&view=auto
==============================================================================
--- aurora/site/source/documentation/0.13.0/operations/configuration.md (added)
+++ aurora/site/source/documentation/0.13.0/operations/configuration.md Fri Apr 15 20:21:30 2016
@@ -0,0 +1,187 @@
+# Scheduler Configuration
+
+The Aurora scheduler can take a variety of configuration options through command-line arguments.
+Examples are available under `examples/scheduler/`. For a list of available Aurora flags and their
+documentation, see [Scheduler Configuration Reference](../reference/scheduler-configuration.md).
+
+
+## A Note on Configuration
+Like Mesos, Aurora uses command-line flags for runtime configuration. As such the Aurora
+"configuration file" is typically a `scheduler.sh` shell script of the form.
+
+    #!/bin/bash
+    AURORA_HOME=/usr/local/aurora-scheduler
+
+    # Flags controlling the JVM.
+    JAVA_OPTS=(
+      -Xmx2g
+      -Xms2g
+      # GC tuning, etc.
+    )
+
+    # Flags controlling the scheduler.
+    AURORA_FLAGS=(
+      # Port for client RPCs and the web UI
+      -http_port=8081
+      # Log configuration, etc.
+    )
+
+    # Environment variables controlling libmesos
+    export JAVA_HOME=...
+    export GLOG_v=1
+    # Port used to communicate with the Mesos master and for the replicated log
+    export LIBPROCESS_PORT=8083
+
+    JAVA_OPTS="${JAVA_OPTS[*]}" exec "$AURORA_HOME/bin/aurora-scheduler" "${AURORA_FLAGS[@]}"
+
+That way Aurora's current flags are visible in `ps` and in the `/vars` admin endpoint.
+
+
+## Replicated Log Configuration
+
+Aurora schedulers use ZooKeeper to discover log replicas and elect a leader. Only one scheduler is
+leader at a given time - the other schedulers follow log writes and prepare to take over as leader
+but do not communicate with the Mesos master. Either 3 or 5 schedulers are recommended in a
+production deployment depending on failure tolerance and they must have persistent storage.
+
+Below is a summary of scheduler storage configuration flags that either don't have default values
+or require attention before deploying in a production environment.
+
+### `-native_log_quorum_size`
+Defines the Mesos replicated log quorum size. In a cluster with `N` schedulers, the flag
+`-native_log_quorum_size` should be set to `floor(N/2) + 1`. So in a cluster with 1 scheduler
+it should be set to `1`, in a cluster with 3 it should be set to `2`, and in a cluster of 5 it
+should be set to `3`.
+
+  Number of schedulers (N) | ```-native_log_quorum_size``` setting (```floor(N/2) + 1```)
+  ------------------------ | -------------------------------------------------------------
+  1                        | 1
+  3                        | 2
+  5                        | 3
+  7                        | 4
+
+*Incorrectly setting this flag will cause data corruption to occur!*
+
+### `-native_log_file_path`
+Location of the Mesos replicated log files. Consider allocating a dedicated disk (preferably SSD)
+for Mesos replicated log files to ensure optimal storage performance.
+
+### `-native_log_zk_group_path`
+ZooKeeper path used for Mesos replicated log quorum discovery.
+
+See [code](../../src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java) for
+other available Mesos replicated log configuration options and default values.
+
+### Changing the Quorum Size
+Special care needs to be taken when changing the size of the Aurora scheduler quorum.
+Since Aurora uses a Mesos replicated log, similar steps need to be followed as when
+[changing the mesos quorum size](http://mesos.apache.org/documentation/latest/operational-guide).
+
+As a preparation, increase `-native_log_quorum_size` on each existing scheduler and restart them.
+When updating from 3 to 5 schedulers, the quorum size would grow from 2 to 3.
+
+When starting the new schedulers, use the `-native_log_quorum_size` set to the new value. Failing to
+first increase the quorum size on running schedulers can in some cases result in corruption
+or truncating of the replicated log used by Aurora. In that case, see the documentation on
+[recovering from backup](backup-restore.md).
+
+
+## Backup Configuration
+
+Configuration options for the Aurora scheduler backup manager.
+
+### `-backup_interval`
+The interval on which the scheduler writes local storage backups.  The default is every hour.
+
+### `-backup_dir`
+Directory to write backups to.
+
+### `-max_saved_backups`
+Maximum number of backups to retain before deleting the oldest backup(s).
+
+
+## Process Logs
+
+### Log destination
+By default, Thermos will write process stdout/stderr to log files in the sandbox. Process object configuration
+allows specifying alternate log file destinations like streamed stdout/stderr or suppression of all log output.
+Default behavior can be configured for the entire cluster with the following flag (through the `-thermos_executor_flags`
+argument to the Aurora scheduler):
+
+    --runner-logger-destination=both
+
+`both` configuration will send logs to files and stream to parent stdout/stderr outputs.
+
+See [Configuration Reference](../reference/configuration.md#logger) for all destination options.
+
+### Log rotation
+By default, Thermos will not rotate the stdout/stderr logs from child processes and they will grow
+without bound. An individual user may change this behavior via configuration on the Process object,
+but it may also be desirable to change the default configuration for the entire cluster.
+In order to enable rotation by default, the following flags can be applied to Thermos (through the
+-thermos_executor_flags argument to the Aurora scheduler):
+
+    --runner-logger-mode=rotate
+    --runner-rotate-log-size-mb=100
+    --runner-rotate-log-backups=10
+
+In the above example, each instance of the Thermos runner will rotate stderr/stdout logs once they
+reach 100 MiB in size and keep a maximum of 10 backups. If a user has provided a custom setting for
+their process, it will override these default settings.
+
+
+
+## Thermos Executor Wrapper
+
+If you need to do computation before starting the thermos executor (for example, setting a different
+`--announcer-hostname` parameter for every executor), then the thermos executor should be invoked
+ inside a wrapper script. In such a case, the aurora scheduler should be started with
+ `-thermos_executor_path` pointing to the wrapper script and `-thermos_executor_resources`
+ set to a comma separated string of all the resources that should be copied into
+ the sandbox (including the original thermos executor).
+
+For example, to wrap the executor inside a simple wrapper, the scheduler will be started like this
+`-thermos_executor_path=/path/to/wrapper.sh -thermos_executor_resources=/usr/share/aurora/bin/thermos_executor.pex`
+
+
+
+### Docker containers
+In order for Aurora to launch jobs using docker containers, a few extra configuration options
+must be set.  The [docker containerizer](http://mesos.apache.org/documentation/latest/docker-containerizer/)
+must be enabled on the mesos slaves by launching them with the `--containerizers=docker,mesos` option.
+
+By default, Aurora will configure Mesos to copy the file specified in `-thermos_executor_path`
+into the container's sandbox.  If using a wrapper script to launch the thermos executor,
+specify the path to the wrapper in that argument. In addition, the path to the executor pex itself
+must be included in the `-thermos_executor_resources` option. Doing so will ensure that both the
+wrapper script and executor are correctly copied into the sandbox. Finally, ensure the wrapper
+script does not access resources outside of the sandbox, as when the script is run from within a
+docker container those resources will not exist.
+
+A scheduler flag, `-global_container_mounts` allows mounting paths from the host (i.e., the slave)
+into all containers on that host. The format is a comma separated list of host_path:container_path[:mode]
+tuples. For example `-global_container_mounts=/opt/secret_keys_dir:/mnt/secret_keys_dir:ro` mounts
+`/opt/secret_keys_dir` from the slaves into all launched containers. Valid modes are `ro` and `rw`.
+
+If you would like to run a container with a read-only filesystem, it may also be necessary to
+pass to use the scheduler flag `-thermos_home_in_sandbox` in order to set HOME to the sandbox
+before the executor runs. This will make sure that the executor/runner PEX extractions happens
+inside of the sandbox instead of the container filesystem root.
+
+If you would like to supply your own parameters to `docker run` when launching jobs in docker
+containers, you may use the following flags:
+
+    -allow_docker_parameters
+    -default_docker_parameters
+
+`-allow_docker_parameters` controls whether or not users may pass their own configuration parameters
+through the job configuration files. If set to `false` (the default), the scheduler will reject
+jobs with custom parameters. *NOTE*: this setting should be used with caution as it allows any job
+owner to specify any parameters they wish, including those that may introduce security concerns
+(`privileged=true`, for example).
+
+`-default_docker_parameters` allows a cluster operator to specify a universal set of parameters that
+should be used for every container that does not have parameters explicitly configured at the job
+level. The argument accepts a multimap format:
+
+    -default_docker_parameters="read-only=true,tmpfs=/tmp,tmpfs=/run"

Added: aurora/site/source/documentation/0.13.0/operations/installation.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/operations/installation.md?rev=1739360&view=auto
==============================================================================
--- aurora/site/source/documentation/0.13.0/operations/installation.md (added)
+++ aurora/site/source/documentation/0.13.0/operations/installation.md Fri Apr 15 20:21:30 2016
@@ -0,0 +1,324 @@
+# Installing Aurora
+
+Source and binary distributions can be found on our
+[downloads](https://aurora.apache.org/downloads/) page.  Installing from binary packages is
+recommended for most.
+
+- [Installing the scheduler](#installing-the-scheduler)
+- [Installing worker components](#installing-worker-components)
+- [Installing the client](#installing-the-client)
+- [Installing Mesos](#installing-mesos)
+- [Troubleshooting](#troubleshooting)
+
+If our binay packages don't suite you, our package build toolchain makes it easy to build your
+own packages. See the [instructions](https://github.com/apache/aurora-packaging) to learn how.
+
+
+## Machine profiles
+
+Given that many of these components communicate over the network, there are numerous ways you could
+assemble them to create an Aurora cluster.  The simplest way is to think in terms of three machine
+profiles:
+
+### Coordinator
+**Components**: ZooKeeper, Aurora scheduler, Mesos master
+
+A small number of machines (typically 3 or 5) responsible for cluster orchestration.  In most cases
+it is fine to co-locate these components in anything but very large clusters (> 1000 machines).
+Beyond that point, operators will likely want to manage these services on separate machines.
+
+In practice, 5 coordinators have been shown to reliably manage clusters with tens of thousands of
+machines.
+
+### Worker
+**Components**: Aurora executor, Aurora observer, Mesos agent
+
+The bulk of the cluster, where services will actually run.
+
+### Client
+**Components**: Aurora client, Aurora admin client
+
+Any machines that users submit jobs from.
+
+
+## Installing the scheduler
+### Ubuntu Trusty
+
+1. Install Mesos
+   Skip down to [install mesos](#mesos-on-ubuntu-trusty), then run:
+
+        sudo start mesos-master
+
+2. Install ZooKeeper
+
+        sudo apt-get install -y zookeeperd
+
+3. Install the Aurora scheduler
+
+        sudo add-apt-repository -y ppa:openjdk-r/ppa
+        sudo apt-get update
+        sudo apt-get install -y openjdk-8-jre-headless wget
+
+        sudo update-alternatives --set java /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
+
+        wget -c https://apache.bintray.com/aurora/ubuntu-trusty/aurora-scheduler_0.12.0_amd64.deb
+        sudo dpkg -i aurora-scheduler_0.12.0_amd64.deb
+
+### CentOS 7
+
+1. Install Mesos
+   Skip down to [install mesos](#mesos-on-centos-7), then run:
+
+        sudo systemctl start mesos-master
+
+2. Install ZooKeeper
+
+        sudo rpm -Uvh https://archive.cloudera.com/cdh4/one-click-install/redhat/6/x86_64/cloudera-cdh-4-0.x86_64.rpm
+        sudo yum install -y java-1.8.0-openjdk-headless zookeeper-server
+
+        sudo service zookeeper-server init
+        sudo systemctl start zookeeper-server
+
+3. Install the Aurora scheduler
+
+        sudo yum install -y wget
+
+        wget -c https://apache.bintray.com/aurora/centos-7/aurora-scheduler-0.12.0-1.el7.centos.aurora.x86_64.rpm
+        sudo yum install -y aurora-scheduler-0.12.0-1.el7.centos.aurora.x86_64.rpm
+
+### Finalizing
+By default, the scheduler will start in an uninitialized mode.  This is because external
+coordination is necessary to be certain operator error does not result in a quorum of schedulers
+starting up and believing their databases are empty when in fact they should be re-joining a
+cluster.
+
+Because of this, a fresh install of the scheduler will need intervention to start up.  First,
+stop the scheduler service.
+Ubuntu: `sudo stop aurora-scheduler`
+CentOS: `sudo systemctl stop aurora`
+
+Now initialize the database:
+
+    sudo -u aurora mkdir -p /var/lib/aurora/scheduler/db
+    sudo -u aurora mesos-log initialize --path=/var/lib/aurora/scheduler/db
+
+Now you can start the scheduler back up.
+Ubuntu: `sudo start aurora-scheduler`
+CentOS: `sudo systemctl start aurora`
+
+
+## Installing worker components
+### Ubuntu Trusty
+
+1. Install Mesos
+   Skip down to [install mesos](#mesos-on-ubuntu-trusty), then run:
+
+        start mesos-slave
+
+2. Install Aurora executor and observer
+
+        sudo apt-get install -y python2.7 wget
+
+        # NOTE: This appears to be a missing dependency of the mesos deb package and is needed
+        # for the python mesos native bindings.
+        sudo apt-get -y install libcurl4-nss-dev
+
+        wget -c https://apache.bintray.com/aurora/ubuntu-trusty/aurora-executor_0.12.0_amd64.deb
+        sudo dpkg -i aurora-executor_0.12.0_amd64.deb
+
+### CentOS 7
+
+1. Install Mesos
+   Skip down to [install mesos](#mesos-on-centos-7), then run:
+
+        sudo systemctl start mesos-slave
+
+2. Install Aurora executor and observer
+
+        sudo yum install -y python2 wget
+
+        wget -c https://apache.bintray.com/aurora/centos-7/aurora-executor-0.12.0-1.el7.centos.aurora.x86_64.rpm
+        sudo yum install -y aurora-executor-0.12.0-1.el7.centos.aurora.x86_64.rpm
+
+### Configuration
+The executor typically does not require configuration.  Command line arguments can
+be passed to the executor using a command line argument on the scheduler.
+
+The observer needs to be configured to look at the correct mesos directory in order to find task
+sandboxes. You should 1st find the Mesos working directory by looking for the Mesos slave
+`--work_dir` flag. You should see something like:
+
+        ps -eocmd | grep "mesos-slave" | grep -v grep | tr ' ' '\n' | grep "\--work_dir"
+        --work_dir=/var/lib/mesos
+
+If the flag is not set, you can view the default value like so:
+
+        mesos-slave --help
+        Usage: mesos-slave [options]
+
+          ...
+          --work_dir=VALUE      Directory path to place framework work directories
+                                (default: /tmp/mesos)
+          ...
+
+The value you find for `--work_dir`, `/var/lib/mesos` in this example, should match the Aurora
+observer value for `--mesos-root`.  You can look for that setting in a similar way on a worker
+node by grepping for `thermos_observer` and `--mesos-root`.  If the flag is not set, you can view
+the default value like so:
+
+        thermos_observer -h
+        Options:
+          ...
+          --mesos-root=MESOS_ROOT
+                                The mesos root directory to search for Thermos
+                                executor sandboxes [default: /var/lib/mesos]
+          ...
+
+In this case the default is `/var/lib/mesos` and we have a match. If there is no match, you can
+either adjust the mesos-master start script(s) and restart the master(s) or else adjust the
+Aurora observer start scripts and restart the observers.  To adjust the Aurora observer:
+
+#### Ubuntu Trusty
+
+    sudo sh -c 'echo "MESOS_ROOT=/tmp/mesos" >> /etc/default/thermos'
+
+NB: In Aurora releases up through 0.12.0, you'll also need to edit /etc/init/thermos.conf like so:
+
+    diff -C 1 /etc/init/thermos.conf.orig /etc/init/thermos.conf
+    *** /etc/init/thermos.conf.orig 2016-03-22 22:34:46.286199718 +0000
+    --- /etc/init/thermos.conf  2016-03-22 17:09:49.357689038 +0000
+    ***************
+    *** 24,25 ****
+    --- 24,26 ----
+          --port=${OBSERVER_PORT:-1338} \
+    +     --mesos-root=${MESOS_ROOT:-/var/lib/mesos} \
+          --log_to_disk=NONE \
+
+#### CentOS 7
+
+Make an edit to add the `--mesos-root` flag resulting in something like:
+
+    grep -A5 OBSERVER_ARGS /etc/sysconfig/thermos-observer
+    OBSERVER_ARGS=(
+      --port=1338
+      --mesos-root=/tmp/mesos
+      --log_to_disk=NONE
+      --log_to_stderr=google:INFO
+    )
+
+## Installing the client
+### Ubuntu Trusty
+
+    sudo apt-get install -y python2.7 wget
+
+    wget -c https://apache.bintray.com/aurora/ubuntu-trusty/aurora-tools_0.12.0_amd64.deb
+    sudo dpkg -i aurora-tools_0.12.0_amd64.deb
+
+### CentOS 7
+
+    sudo yum install -y python2 wget
+
+    wget -c https://apache.bintray.com/aurora/centos-7/aurora-tools-0.12.0-1.el7.centos.aurora.x86_64.rpm
+    sudo yum install -y aurora-tools-0.12.0-1.el7.centos.aurora.x86_64.rpm
+
+### Mac OS X
+
+    brew upgrade
+    brew install aurora-cli
+
+### Configuration
+Client configuration lives in a json file that describes the clusters available and how to reach
+them.  By default this file is at `/etc/aurora/clusters.json`.
+
+Jobs may be submitted to the scheduler using the client, and are described with
+[job configurations](../reference/configuration.md) expressed in `.aurora` files.  Typically you will
+maintain a single job configuration file to describe one or more deployment environments (e.g.
+dev, test, prod) for a production job.
+
+
+## Installing Mesos
+Mesos uses a single package for the Mesos master and slave.  As a result, the package dependencies
+are identical for both.
+
+### Mesos on Ubuntu Trusty
+
+    sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E56151BF
+    DISTRO=$(lsb_release -is | tr '[:upper:]' '[:lower:]')
+    CODENAME=$(lsb_release -cs)
+
+    echo "deb http://repos.mesosphere.io/${DISTRO} ${CODENAME} main" | \
+      sudo tee /etc/apt/sources.list.d/mesosphere.list
+    sudo apt-get -y update
+
+    # Use `apt-cache showpkg mesos | grep [version]` to find the exact version.
+    sudo apt-get -y install mesos=0.25.0-0.2.70.ubuntu1404
+
+### Mesos on CentOS 7
+
+    sudo rpm -Uvh https://repos.mesosphere.io/el/7/noarch/RPMS/mesosphere-el-repo-7-1.noarch.rpm
+    sudo yum -y install mesos-0.25.0
+
+
+
+## Troubleshooting
+So you've started your first cluster and are running into some issues? We've collected some common
+stumbling blocks and solutions here to help get you moving.
+
+### Replicated log not initialized
+
+#### Symptoms
+- Scheduler RPCs and web interface claim `Storage is not READY`
+- Scheduler log repeatedly prints messages like
+
+  ```
+  I1016 16:12:27.234133 26081 replica.cpp:638] Replica in EMPTY status
+  received a broadcasted recover request
+  I1016 16:12:27.234256 26084 recover.cpp:188] Received a recover response
+  from a replica in EMPTY status
+  ```
+
+#### Solution
+When you create a new cluster, you need to inform a quorum of schedulers that they are safe to
+consider their database to be empty by [initializing](#finalizing) the
+replicated log. This is done to prevent the scheduler from modifying the cluster state in the event
+of multiple simultaneous disk failures or, more likely, misconfiguration of the replicated log path.
+
+
+### Scheduler not registered
+
+#### Symptoms
+Scheduler log contains
+
+    Framework has not been registered within the tolerated delay.
+
+#### Solution
+Double-check that the scheduler is configured correctly to reach the Mesos master. If you are registering
+the master in ZooKeeper, make sure command line argument to the master:
+
+    --zk=zk://$ZK_HOST:2181/mesos/master
+
+is the same as the one on the scheduler:
+
+    -mesos_master_address=zk://$ZK_HOST:2181/mesos/master
+
+
+### Scheduler not running
+
+### Symptom
+The scheduler process commits suicide regularly. This happens under error conditions, but
+also on purpose in regular intervals.
+
+## Solution
+Aurora is meant to be run under supervision. You have to configure a supervisor like
+[Monit](http://mmonit.com/monit/) or [supervisord](http://supervisord.org/) to run the scheduler
+and restart it whenever it fails or exists on purpose.
+
+Aurora supports an active health checking protocol on its admin HTTP interface - if a `GET /health`
+times out or returns anything other than `200 OK` the scheduler process is unhealthy and should be
+restarted.
+
+For example, monit can be configured with
+
+    if failed port 8081 send "GET /health HTTP/1.0\r\n" expect "OK\n" with timeout 2 seconds for 10 cycles then restart
+
+assuming you set `-http_port=8081`.

Added: aurora/site/source/documentation/0.13.0/operations/monitoring.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.13.0/operations/monitoring.md?rev=1739360&view=auto
==============================================================================
--- aurora/site/source/documentation/0.13.0/operations/monitoring.md (added)
+++ aurora/site/source/documentation/0.13.0/operations/monitoring.md Fri Apr 15 20:21:30 2016
@@ -0,0 +1,181 @@
+# Monitoring your Aurora cluster
+
+Before you start running important services in your Aurora cluster, it's important to set up
+monitoring and alerting of Aurora itself.  Most of your monitoring can be against the scheduler,
+since it will give you a global view of what's going on.
+
+## Reading stats
+The scheduler exposes a *lot* of instrumentation data via its HTTP interface. You can get a quick
+peek at the first few of these in our vagrant image:
+
+    $ vagrant ssh -c 'curl -s localhost:8081/vars | head'
+    async_tasks_completed 1004
+    attribute_store_fetch_all_events 15
+    attribute_store_fetch_all_events_per_sec 0.0
+    attribute_store_fetch_all_nanos_per_event 0.0
+    attribute_store_fetch_all_nanos_total 3048285
+    attribute_store_fetch_all_nanos_total_per_sec 0.0
+    attribute_store_fetch_one_events 3391
+    attribute_store_fetch_one_events_per_sec 0.0
+    attribute_store_fetch_one_nanos_per_event 0.0
+    attribute_store_fetch_one_nanos_total 454690753
+
+These values are served as `Content-Type: text/plain`, with each line containing a space-separated metric
+name and value. Values may be integers, doubles, or strings (note: strings are static, others
+may be dynamic).
+
+If your monitoring infrastructure prefers JSON, the scheduler exports that as well:
+
+    $ vagrant ssh -c 'curl -s localhost:8081/vars.json | python -mjson.tool | head'
+    {
+        "async_tasks_completed": 1009,
+        "attribute_store_fetch_all_events": 15,
+        "attribute_store_fetch_all_events_per_sec": 0.0,
+        "attribute_store_fetch_all_nanos_per_event": 0.0,
+        "attribute_store_fetch_all_nanos_total": 3048285,
+        "attribute_store_fetch_all_nanos_total_per_sec": 0.0,
+        "attribute_store_fetch_one_events": 3409,
+        "attribute_store_fetch_one_events_per_sec": 0.0,
+        "attribute_store_fetch_one_nanos_per_event": 0.0,
+
+This will be the same data as above, served with `Content-Type: application/json`.
+
+## Viewing live stat samples on the scheduler
+The scheduler uses the Twitter commons stats library, which keeps an internal time-series database
+of exported variables - nearly everything in `/vars` is available for instant graphing.  This is
+useful for debugging, but is not a replacement for an external monitoring system.
+
+You can view these graphs on a scheduler at `/graphview`.  It supports some composition and
+aggregation of values, which can be invaluable when triaging a problem.  For example, if you have
+the scheduler running in vagrant, check out these links:
+[simple graph](http://192.168.33.7:8081/graphview?query=jvm_uptime_secs)
+[complex composition](http://192.168.33.7:8081/graphview?query=rate\(scheduler_log_native_append_nanos_total\)%2Frate\(scheduler_log_native_append_events\)%2F1e6)
+
+### Counters and gauges
+Among numeric stats, there are two fundamental types of stats exported: _counters_ and _gauges_.
+Counters are guaranteed to be monotonically-increasing for the lifetime of a process, while gauges
+may decrease in value.  Aurora uses counters to represent things like the number of times an event
+has occurred, and gauges to capture things like the current length of a queue.  Counters are a
+natural fit for accurate composition into [rate ratios](http://en.wikipedia.org/wiki/Rate_ratio)
+(useful for sample-resistant latency calculation), while gauges are not.
+
+# Alerting
+
+## Quickstart
+If you are looking for just bare-minimum alerting to get something in place quickly, set up alerting
+on `framework_registered` and `task_store_LOST`. These will give you a decent picture of overall
+health.
+
+## A note on thresholds
+One of the most difficult things in monitoring is choosing alert thresholds. With many of these
+stats, there is no value we can offer as a threshold that will be guaranteed to work for you. It
+will depend on the size of your cluster, number of jobs, churn of tasks in the cluster, etc. We
+recommend you start with a strict value after viewing a small amount of collected data, and then
+adjust thresholds as you see fit. Feel free to ask us if you would like to validate that your alerts
+and thresholds make sense.
+
+## Important stats
+
+### `jvm_uptime_secs`
+Type: integer counter
+
+The number of seconds the JVM process has been running. Comes from
+[RuntimeMXBean#getUptime()](http://docs.oracle.com/javase/7/docs/api/java/lang/management/RuntimeMXBean.html#getUptime\(\))
+
+Detecting resets (decreasing values) on this stat will tell you that the scheduler is failing to
+stay alive.
+
+Look at the scheduler logs to identify the reason the scheduler is exiting.
+
+### `system_load_avg`
+Type: double gauge
+
+The current load average of the system for the last minute. Comes from
+[OperatingSystemMXBean#getSystemLoadAverage()](http://docs.oracle.com/javase/7/docs/api/java/lang/management/OperatingSystemMXBean.html?is-external=true#getSystemLoadAverage\(\)).
+
+A high sustained value suggests that the scheduler machine may be over-utilized.
+
+Use standard unix tools like `top` and `ps` to track down the offending process(es).
+
+### `process_cpu_cores_utilized`
+Type: double gauge
+
+The current number of CPU cores in use by the JVM process. This should not exceed the number of
+logical CPU cores on the machine. Derived from
+[OperatingSystemMXBean#getProcessCpuTime()](http://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html)
+
+A high sustained value indicates that the scheduler is overworked. Due to current internal design
+limitations, if this value is sustained at `1`, there is a good chance the scheduler is under water.
+
+There are two main inputs that tend to drive this figure: task scheduling attempts and status
+updates from Mesos.  You may see activity in the scheduler logs to give an indication of where
+time is being spent.  Beyond that, it really takes good familiarity with the code to effectively
+triage this.  We suggest engaging with an Aurora developer.
+
+### `task_store_LOST`
+Type: integer gauge
+
+The number of tasks stored in the scheduler that are in the `LOST` state, and have been rescheduled.
+
+If this value is increasing at a high rate, it is a sign of trouble.
+
+There are many sources of `LOST` tasks in Mesos: the scheduler, master, slave, and executor can all
+trigger this.  The first step is to look in the scheduler logs for `LOST` to identify where the
+state changes are originating.
+
+### `scheduler_resource_offers`
+Type: integer counter
+
+The number of resource offers that the scheduler has received.
+
+For a healthy scheduler, this value must be increasing over time.
+
+Assuming the scheduler is up and otherwise healthy, you will want to check if the master thinks it
+is sending offers. You should also look at the master's web interface to see if it has a large
+number of outstanding offers that it is waiting to be returned.
+
+### `framework_registered`
+Type: binary integer counter
+
+Will be `1` for the leading scheduler that is registered with the Mesos master, `0` for passive
+schedulers,
+
+A sustained period without a `1` (or where `sum() != 1`) warrants investigation.
+
+If there is no leading scheduler, look in the scheduler and master logs for why.  If there are
+multiple schedulers claiming leadership, this suggests a split brain and warrants filing a critical
+bug.
+
+### `rate(scheduler_log_native_append_nanos_total)/rate(scheduler_log_native_append_events)`
+Type: rate ratio of integer counters
+
+This composes two counters to compute a windowed figure for the latency of replicated log writes.
+
+A hike in this value suggests disk bandwidth contention.
+
+Look in scheduler logs for any reported oddness with saving to the replicated log. Also use
+standard tools like `vmstat` and `iotop` to identify whether the disk has become slow or
+over-utilized. We suggest using a dedicated disk for the replicated log to mitigate this.
+
+### `timed_out_tasks`
+Type: integer counter
+
+Tracks the number of times the scheduler has given up while waiting
+(for `-transient_task_state_timeout`) to hear back about a task that is in a transient state
+(e.g. `ASSIGNED`, `KILLING`), and has moved to `LOST` before rescheduling.
+
+This value is currently known to increase occasionally when the scheduler fails over
+([AURORA-740](https://issues.apache.org/jira/browse/AURORA-740)). However, any large spike in this
+value warrants investigation.
+
+The scheduler will log when it times out a task. You should trace the task ID of the timed out
+task into the master, slave, and/or executors to determine where the message was dropped.
+
+### `http_500_responses_events`
+Type: integer counter
+
+The total number of HTTP 500 status responses sent by the scheduler. Includes API and asset serving.
+
+An increase warrants investigation.
+
+Look in scheduler logs to identify why the scheduler returned a 500, there should be a stack trace.



Mime
View raw message