; Sat, 12 Dec 2015 01:46:50 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: svn commit: r1719617 [3/4] - in /aurora/site: publish/ publish/blog/ publish/blog/2015-upcoming-apache-aurora-meetups/ publish/blog/aurora-0-6-0-incubating-released/ publish/blog/aurora-0-7-0-incubating-released/ publish/blog/aurora-0-8-0-released/ pub... Date: Sat, 12 Dec 2015 01:46:49 -0000 To: commits@aurora.apache.org From: wfarner@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20151212014650.4B35C3A2309@svn01-us-west.apache.org> Modified: aurora/site/publish/documentation/latest/hooks/index.html URL: http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/hooks/index.html?rev=1719617&r1=1719616&r2=1719617&view=diff ============================================================================== --- aurora/site/publish/documentation/latest/hooks/index.html (original) +++ aurora/site/publish/documentation/latest/hooks/index.html Sat Dec 12 01:46:48 2015 @@ -21,12 +21,11 @@ -

+ +

@@ -81,7 +81,7 @@ return False. Designers/con consider whether or not to error-trap them. You can error trap at the highest level very generally and always pass the pre_ hook by returning True. For example:

def pre_create(...):
+def pre_create(...):
   do_something()  # if do_something fails with an exception, the create_job is not attempted!
   return True
 
@@ -89,10 +89,11 @@ returning True. For example
 def pre_create(...):
   try:
     do_something()  # may cause exception
-  except Exception:  # generic error trap will catch it
+  except Exception:  # generic error trap will catch it
     pass  # and ignore the exception
   return True  # create_job will run in any case!
-

+

post_<method_name>: A post_ hook executes after its associated method successfully finishes running. If it fails, the already executed method is unaffected. A post_ hook’s error is trapped, and any later operations are unaffected.

err_<method_name>: Executes only when its associated method returns a status other than OK or throws an exception. If an err_ hook fails, the already executed method is unaffected. An err_ hook’s error is trapped, and any later operations are unaffected.

@@ -187,11 +188,12 @@ returning True. For example

By default, hooks are inactive. If you do not want to use hooks, you do not need to make any changes to your code. If you do want to use hooks, you will need to alter your .aurora config file to activate them both for the configuration as a whole as well as for individual Jobs. And, of course, you will need to define in your config file what happens when a particular hook executes.

.aurora Config File Settings

You can define a top-level hooks variable in any .aurora config file. hooks is a list of all objects that define hooks used by Jobs defined in that config file. If you do not want to define any hooks for a configuration, hooks is optional.

hooks = [Object_with_defined_hooks1, Object_with_defined_hooks2]
-

hooks = [Object_with_defined_hooks1, Object_with_defined_hooks2]
+

Be careful when assembling a config file using include on multiple smaller config files. If there are multiple files that assign a value to hooks, only the last assignment made will stick. For example, if x.aurora has hooks = [a, b, c] and y.aurora has hooks = [d, e, f] and z.aurora has, in this order, include x.aurora and include y.aurora, the hooks value will be [d, e, f].

Also, for any Job that you want to use hooks with, its Job definition in the .aurora config file must set an enable_hooks flag to True (it defaults to False). By default, hooks are disabled and you must enable them for Jobs of your choice.

@@ -199,21 +201,24 @@ returning True. For example

To summarize, to use hooks for a particular job, you must both activate hooks for your config file as a whole, and for that job. Activating hooks only for individual jobs won’t work, nor will only activating hooks for your config file as a whole. You must also specify the hooks’ defining object in the hooks variable.

Recall that .aurora config files are written in Pystachio. So the following turns on hooks for production jobs at cluster1 and cluster2, but leaves them off for similar jobs with a defined user role. Of course, you also need to list the objects that define the hooks in your config file’s hooks variable.

jobs = [
-        Job(enable_hooks = True, cluster = c, env = 'prod') for c in ('cluster1', 'cluster2')
+jobs = [
+        Job(enable_hooks = True, cluster = c, env = 'prod') for c in ('cluster1', 'cluster2')
        ]
 jobs.extend(
-   Job(cluster = c, env = 'prod', role = getpass.getuser()) for c in ('cluster1', 'cluster2'))
+   Job(cluster = c, env = 'prod', role = getpass.getuser()) for c in ('cluster1', 'cluster2'))
    # Hooks disabled for these jobs
-

+

Command Line

All Aurora Command Line commands now accept an .aurora config file as an optional parameter (some, of course, accept it as a required parameter). Whenever a command has a .aurora file parameter, any hooks specified and activated in the .aurora file can be used. For example:

aurora job restart cluster1/role/env/app myapp.aurora
-

aurora job restart cluster1/role/env/app myapp.aurora
+

The command activates any hooks specified and activated in myapp.aurora. For the restart command, that is the only thing the myapp.aurora parameter does. So, if the command was the following, since there is no .aurora config file to specify any hooks, no hooks on the restart command can run.

aurora job restart cluster1/role/env/app
-

aurora job restart cluster1/role/env/app
+

Hooks Protocol

Any object defined in the .aurora config file can define hook methods. You should define your hook methods within a class, and then use the class name as a value in the hooks list in your config file.

@@ -221,21 +226,23 @@ returning True. For example

Note that you can define other methods in the class that its hook methods can call; all the logic of a hook does not have to be in its definition.

The following example defines a class containing a pre_kill_job hook definition that calls another method defined in the class.

# Defines a method pre_kill_job
+# Defines a method pre_kill_job
 class KillConfirmer(object):
   def confirm(self, msg):
-    return raw_input(msg).lower() == 'yes'
+    return raw_input(msg).lower() == 'yes'
 
   def pre_kill_job(self, job_key, shards=None):
-    shards = ('shards %s' % shards) if shards is not None else 'all shards'
-    return self.confirm('Are you sure you want to kill %s (%s)? (yes/no): '
+    shards = ('shards %s' % shards) if shards is not None else 'all shards'
+    return self.confirm('Are you sure you want to kill %s (%s)? (yes/no): '
                         % (job_key, shards))
-

+

pre_ Methods

pre_ methods have the signature:

pre_<API method name>(self, <associated method's signature>)
-

pre_<API method name>(self, <associated method's signature>)
+

pre_ methods have the same signature as their associated method, with the addition of self as the first parameter. See the chart above for the mapping of parameters to methods. When writing pre_ methods, you can use the * and ** syntax to designate that all unspecified parameters are passed in a list to the *ed variable and all named parameters with values are passed as name/value pairs to the **ed variable.

If this method returns False, the API command call aborts.

@@ -243,8 +250,9 @@ returning True. For example

err_ Methods

err_ methods have the signature:

err_<API method name>(self, exc, <associated method's signature>)
-

err_<API method name>(self, exc, <associated method's signature>)
+

err_ methods have the same signature as their associated method, with the addition of a first parameter self and a second parameter exc. exc is either a result with responseCode other than ResponseCode.OK or an Exception. See the chart above for the mapping of parameters to methods. When writing err_ methods, you can use the * and ** syntax to designate that all unspecified parameters are passed in a list to the *ed variable and all named parameters with values are passed as name/value pairs to the **ed variable.

err_ method return codes are ignored.

@@ -252,8 +260,9 @@ returning True. For example

post_ Methods

post_ methods have the signature:

post_<API method name>(self, result, <associated method signature>)
-

post_<API method name>(self, result, <associated method signature>)
+

post_ method parameters are self, then result, followed by the same parameter signature as their associated method. result is the result of the associated method call. See the chart above for the mapping of parameters to methods. When writing post_ methods, you can use the * and ** syntax to designate that all unspecified arguments are passed in a list to the *ed parameter and all unspecified named arguments with values are passed as name/value pairs to the **ed parameter.

post_ method return codes are ignored.

@@ -261,8 +270,9 @@ returning True. For example

Generic Hooks

There are seven Aurora API Methods which any of the three hook types can attach to. Thus, there are 21 possible hook/method combinations for a single .aurora config file. Say that you define pre_ and post_ hooks for the restart method. That leaves 19 undefined hook/method combinations; err_restart and the 3 pre_, post_, and err_ hooks for each of the other 6 hookable methods. You can define what happens when any of these otherwise undefined 19 hooks execute via a generic hook, whose signature is:

generic_hook(self, hook_config, event, method_name, result_or_err, args*, kw**)
-

generic_hook(self, hook_config, event, method_name, result_or_err, args*, kw**)
+

where:

True

Example:

# Overrides the standard do-nothing generic_hook by adding a log writing operation.
+# Overrides the standard do-nothing generic_hook by adding a log writing operation.
 from twitter.common import log
   class Logger(object):
-    '''Adds to the log every time a hookable API method is called'''
+    '''Adds to the log every time a hookable API method is called'''
     def generic_hook(self, hook_config, event, method_name, result_or_err, *args, **kw)
-      log.info('%s: %s_%s of %s'
+      log.info('%s: %s_%s of %s'
                % (self.__class__.__name__, event, method_name, hook_config.job_key))
-

+

Hooks Process Checklist

In your .aurora config file, add a hooks variable. Note that you may want to define a .aurora file only for hook definitions and then include this file in multiple other config files that you want to use the same hooks.

hooks = []
-

hooks = []
+

In the hooks variable, list all objects that define hooks used by Jobs defined in this config:

hooks = [Object_hook_definer1, Object_hook_definer2]
-

hooks = [Object_hook_definer1, Object_hook_definer2]
+

For each job that uses hooks in this config file, add enable_hooks = True to the Job definition. Note that this is necessary even if you only want to use the generic hook.
Write your pre_, post_, and err_ hook definitions as part of an object definition in your .aurora config file.
If desired, write your generic_hook definition as part of an object definition in your .aurora config file. Remember, the object must be listed as a member of hooks.
If your Aurora command line command does not otherwise take an .aurora config file argument, add the appropriate .aurora file as an argument in order to define and activate the configuration’s hooks.

@@ -335,5 +348,6 @@ returning True. For example

+ \ No newline at end of file Modified: aurora/site/publish/documentation/latest/index.html URL: http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/index.html?rev=1719617&r1=1719616&r2=1719617&view=diff ============================================================================== --- aurora/site/publish/documentation/latest/index.html (original) +++ aurora/site/publish/documentation/latest/index.html Sat Dec 12 01:46:48 2015 @@ -21,12 +21,11 @@ -

+ +

@@ -72,23 +72,30 @@

Scheduler Storage

Scheduler Storage and Maintenance

SLA Measurement

Resource Isolation and Sizing

Generating test resources

Developers

+ +

Additional Resources

+ +

Presentation videos and slides

@@ -113,5 +120,6 @@

+ \ No newline at end of file Modified: aurora/site/publish/documentation/latest/monitoring/index.html URL: http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/monitoring/index.html?rev=1719617&r1=1719616&r2=1719617&view=diff ============================================================================== --- aurora/site/publish/documentation/latest/monitoring/index.html (original) +++ aurora/site/publish/documentation/latest/monitoring/index.html Sat Dec 12 01:46:48 2015 @@ -21,12 +21,11 @@ -

+ +

@@ -49,7 +49,7 @@ since it will give you a global view of

The scheduler exposes a lot of instrumentation data via its HTTP interface. You can get a quick peek at the first few of these in our vagrant image:

$ vagrant ssh -c 'curl -s localhost:8081/vars | head'
+$ vagrant ssh -c 'curl -s localhost:8081/vars | head'
 async_tasks_completed 1004
 attribute_store_fetch_all_events 15
 attribute_store_fetch_all_events_per_sec 0.0
@@ -60,24 +60,26 @@ attribute_store_fetch_one_events 3391
 attribute_store_fetch_one_events_per_sec 0.0
 attribute_store_fetch_one_nanos_per_event 0.0
 attribute_store_fetch_one_nanos_total 454690753
-

+

These values are served as Content-Type: text/plain, with each line containing a space-separated metric name and value. Values may be integers, doubles, or strings (note: strings are static, others may be dynamic).

If your monitoring infrastructure prefers JSON, the scheduler exports that as well:

$ vagrant ssh -c 'curl -s localhost:8081/vars.json | python -mjson.tool | head'
+$ vagrant ssh -c 'curl -s localhost:8081/vars.json | python -mjson.tool | head'
 {
-    "async_tasks_completed": 1009,
-    "attribute_store_fetch_all_events": 15,
-    "attribute_store_fetch_all_events_per_sec": 0.0,
-    "attribute_store_fetch_all_nanos_per_event": 0.0,
-    "attribute_store_fetch_all_nanos_total": 3048285,
-    "attribute_store_fetch_all_nanos_total_per_sec": 0.0,
-    "attribute_store_fetch_one_events": 3409,
-    "attribute_store_fetch_one_events_per_sec": 0.0,
-    "attribute_store_fetch_one_nanos_per_event": 0.0,
-

+    "async_tasks_completed": 1009,
+    "attribute_store_fetch_all_events": 15,
+    "attribute_store_fetch_all_events_per_sec": 0.0,
+    "attribute_store_fetch_all_nanos_per_event": 0.0,
+    "attribute_store_fetch_all_nanos_total": 3048285,
+    "attribute_store_fetch_all_nanos_total_per_sec": 0.0,
+    "attribute_store_fetch_one_events": 3409,
+    "attribute_store_fetch_one_events_per_sec": 0.0,
+    "attribute_store_fetch_one_nanos_per_event": 0.0,
+

This will be the same data as above, served with Content-Type: application/json.

Viewing live stat samples on the scheduler

@@ -118,177 +120,125 @@ recommend you start with a strict value adjust thresholds as you see fit. Feel free to ask us if you would like to validate that your alerts and thresholds make sense.

`jvm_uptime_secs`

Important stats

Type: integer counter

`jvm_uptime_secs`

Description

@@ -313,5 +263,6 @@ task into the master, slave, and/or exec

+ \ No newline at end of file Modified: aurora/site/publish/documentation/latest/presentations/index.html URL: http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/presentations/index.html?rev=1719617&r1=1719616&r2=1719617&view=diff ============================================================================== --- aurora/site/publish/documentation/latest/presentations/index.html (original) +++ aurora/site/publish/documentation/latest/presentations/index.html Sat Dec 12 01:46:48 2015 @@ -21,12 +21,11 @@ -

+ +

@@ -90,11 +90,11 @@

March 25, 2014 at Aurora and Mesos Frameworks Meetup

@@ -119,5 +119,6 @@

+ \ No newline at end of file Modified: aurora/site/publish/documentation/latest/resources/index.html URL: http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/resources/index.html?rev=1719617&r1=1719616&r2=1719617&view=diff ============================================================================== --- aurora/site/publish/documentation/latest/resources/index.html (original) +++ aurora/site/publish/documentation/latest/resources/index.html Sat Dec 12 01:46:48 2015 @@ -21,12 +21,11 @@ -

+ +

@@ -206,11 +206,11 @@ that role.

production jobs may preempt tasks from any non-production job. A production task may only be preempted by tasks from production jobs in the same role with higher priority.

@@ -235,5 +235,6 @@ higher

@@ -86,11 +86,11 @@ the -dlog_snapshot_interval

To disable deduplication, for example to rollback to Aurora, restart all of the cluster’s schedulers with -deduplicate_snapshots=false and either wait for a snapshot or force one using aurora_admin snapshot.

@@ -115,5 +115,6 @@ using aurora_admin snapshot

+ \ No newline at end of file Modified: aurora/site/publish/documentation/latest/security/index.html URL: http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/security/index.html?rev=1719617&r1=1719616&r2=1719617&view=diff ============================================================================== --- aurora/site/publish/documentation/latest/security/index.html (original) +++ aurora/site/publish/documentation/latest/security/index.html Sat Dec 12 01:46:48 2015 @@ -21,12 +21,11 @@ -

@@ -103,24 +103,26 @@ considerations.

Server Configuration

At a minimum you need to set 4 command-line flags on the scheduler:

-http_authentication_mechanism=BASIC
+-http_authentication_mechanism=BASIC
 -shiro_realm_modules=INI_AUTHNZ
 -shiro_ini_path=path/to/security.ini
-

+

And create a security.ini file like so:

[users]
+[users]
 sally = apple, admin
 
 [roles]
 admin = *
-

+

The details of the security.ini file are explained below. Note that this file contains plaintext, unhashed passwords.

Client Configuration

To configure the client for HTTP Basic authentication, add an entry to ~/.netrc with your credentials

% cat ~/.netrc
+% cat ~/.netrc
 # ...
 
 machine aurora.example.com
@@ -128,68 +130,78 @@ login sally
 password apple
 
 # ...
-

+

No changes are required to clusters.json.

HTTP SPNEGO Authentication (Kerberos)

Server Configuration

At a minimum you need to set 6 command-line flags on the scheduler:

-http_authentication_mechanism=NEGOTIATE
+-http_authentication_mechanism=NEGOTIATE
 -shiro_realm_modules=KERBEROS5_AUTHN,INI_AUTHNZ
 -kerberos_server_principal=HTTP/aurora.example.com@EXAMPLE.COM
 -kerberos_server_keytab=path/to/aurora.example.com.keytab
 -shiro_ini_path=path/to/security.ini
-

+

And create a security.ini file like so:

% cat path/to/security.ini
+% cat path/to/security.ini
 [users]
 sally = _, admin
 
 [roles]
 admin = *
-

+

What’s going on here? First, Aurora must be configured to request Kerberos credentials when presented with an unauthenticated request. This is achieved by setting

-http_authentication_mechanism=NEGOTIATE
-

-http_authentication_mechanism=NEGOTIATE
+

Next, a Realm module must be configured to authenticate the current request using the Kerberos credentials that were requested. Aurora ships with a realm module that can do this

-shiro_realm_modules=KERBEROS5_AUTHN[,...]
-

-shiro_realm_modules=KERBEROS5_AUTHN[,...]
+

The Kerberos5Realm requires a keytab file and a server principal name. The principal name will usually be in the form HTTP/aurora.example.com@EXAMPLE.COM.

-kerberos_server_principal=HTTP/aurora.example.com@EXAMPLE.COM
+-kerberos_server_principal=HTTP/aurora.example.com@EXAMPLE.COM
 -kerberos_server_keytab=path/to/aurora.example.com.keytab
-

+

The Kerberos5 realm module is authentication-only. For scheduler security to work you must also enable a realm module that provides an Authorizer implementation. For example, to do this using the IniShiroRealmModule:

-shiro_realm_modules=KERBEROS5_AUTHN,INI_AUTHNZ
-

-shiro_realm_modules=KERBEROS5_AUTHN,INI_AUTHNZ
+

You can then configure authorization using a security.ini file as described below (the password field is ignored). You must configure the realm module with the path to this file:

-shiro_ini_path=path/to/security.ini
-

-shiro_ini_path=path/to/security.ini
+

Client Configuration

To use Kerberos on the client-side you must build Kerberos-enabled client binaries. Do this with

./pants binary src/main/python/apache/aurora/client/cli:kaurora
-./pants binary src/main/python/apache/aurora/admin:kaurora_admin
-

./pants binary src/main/python/apache/aurora/kerberos:kaurora
+./pants binary src/main/python/apache/aurora/kerberos:kaurora_admin
+

You must also configure each cluster where you’ve enabled Kerberos on the scheduler to use Kerberos authentication. Do this by setting auth_mechanism to KERBEROS in clusters.json.

% cat ~/.aurora/clusters.json
+% cat ~/.aurora/clusters.json
 {
-    "devcluser": {
-        "auth_mechanism": "KERBEROS",
+    "devcluser": {
+        "auth_mechanism": "KERBEROS",
         ...
     },
     ...
 }
-

+

Authorization

Given a means to authenticate the entity a client claims they are, we need to define what privileges they have.

@@ -202,16 +214,17 @@ likely the preferred approach. However are rapidly changing, or if your access control information already exists in another system.

You can enable INI-based configuration with following scheduler command line arguments:

-http_authentication_mechanism=BASIC
+-http_authentication_mechanism=BASIC
 -shiro_ini_path=path/to/security.ini
-

+

note As the argument name reveals, this is using Shiroâs IniRealm behind the scenes.

The INI file will contain two sections - users and roles. Hereâs an example for what might be in security.ini:

[users]
+[users]
 sally = apple, admin
 jim = 123456, accounting
 becky = letmein, webapp
@@ -222,7 +235,8 @@ steve = password
 admin = *
 accounting = thrift.AuroraAdmin:setQuota
 webapp = thrift.AuroraSchedulerManager:*:webapp
-

+

The users section defines user user credentials and the role(s) they are members of. These lines are of the format <user> = <password>[, <role>...]. As you probably noticed, the passwords are in plaintext and as a result read access to this file should be restricted.

@@ -254,7 +268,7 @@ for more information.

Packaging a realm module

Package your custom Realm(s) with a Guice module that exposes a Set<Realm> multibinding.

package com.example;
+package com.example;
 
 import com.google.inject.AbstractModule;
 import com.google.inject.multibindings.Multibinder;
@@ -272,11 +286,13 @@ for more information.
     // Realm implementation.
   }
 }
-

+

To use your module in the scheduler, include it as a realm module based on its fully-qualified class name:

-shiro_realm_modules=KERBEROS5_AUTHN,INI_AUTHNZ,com.example.MyRealmModule
-

-shiro_realm_modules=KERBEROS5_AUTHN,INI_AUTHNZ,com.example.MyRealmModule
+

Known Issues

While the APIs and SPIs we ship with are stable as of 0.8.0, we are aware of several incremental @@ -289,11 +305,11 @@ improvements. Please follow, vote, or se * AURORA-1293: Consider defining a JSON format in place of INI * AURORA-1179: Supported hashed passwords in security.ini * AURORA-1295: Support security for the ReadOnlyScheduler service

@@ -318,5 +334,6 @@ improvements. Please follow, vote, or se

+ \ No newline at end of file Modified: aurora/site/publish/documentation/latest/sla/index.html URL: http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/sla/index.html?rev=1719617&r1=1719616&r2=1719617&view=diff ============================================================================== --- aurora/site/publish/documentation/latest/sla/index.html (original) +++ aurora/site/publish/documentation/latest/sla/index.html Sat Dec 12 01:46:48 2015 @@ -21,12 +21,11 @@ -

+ +

@@ -60,8 +60,9 @@ Agreements) metrics that defining a contractual relationship between the Aurora/Mesos platform and hosted services.

The Aurora SLA feature currently supports stat collection only for service (non-cron) -production jobs ("production = True" in your .aurora config).

The Aurora SLA feature is by default only enabled for service (non-cron) +production jobs ("production = True" in your .aurora config). It can be enabled for +non-production services via the scheduler command line flag -sla_non_prod_metrics.

Counters that track SLA measurements are computed periodically within the scheduler. The individual instance metrics are refreshed every minute (configurable via @@ -145,7 +146,7 @@ percentiles (50th,75th,90th,95th and 99t You can also get customized real-time stats from aurora client. See aurora sla -h for more details.

Median Time To Assigned (MTTA)

Median time a job spends waiting for its tasks to be assigned to a host. This is a combined metric that helps track the dependency of scheduling performance on the requested resources @@ -187,7 +188,7 @@ metric that helps track the dependency o that are still PENDING. This ensures straggler instances (e.g. with unreasonable resource constraints) do not affect metric curves.

-
Median Time To Running (MTTR)
+
Median Time To Running (MTTR)

Median time a job waits for its tasks to reach RUNNING state. This is a comprehensive metric reflecting on the overall time it takes for the Aurora/Mesos to start executing user content.
@@ -234,11 +235,11 @@ unreasonable resource constraints) do no
All metrics are calculated at a pre-defined interval (currently set at 1 minute). Scheduler restarts may result in missed collections.
+

-

@@ -263,5 +264,6 @@ Scheduler restarts may result in missed

+ \ No newline at end of file Modified: aurora/site/publish/documentation/latest/storage-config/index.html URL: http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/storage-config/index.html?rev=1719617&r1=1719616&r2=1719617&view=diff ============================================================================== --- aurora/site/publish/documentation/latest/storage-config/index.html (original) +++ aurora/site/publish/documentation/latest/storage-config/index.html Sat Dec 12 01:46:48 2015 @@ -21,12 +21,11 @@ -

-
+

Documentation

Community

Downloads
@@ -34,7 +33,8 @@

-
+ +

@@ -80,18 +80,18 @@ or require attention before deploying in
Mesos replicated log configuration flags
-
-nativelogquorum_size
+
-nativelogquorum_size

Defines the Mesos replicated log quorum size. See the replicated log configuration document on how to choose the right value.
-
-nativelogfile_path
+
-nativelogfile_path

Location of the Mesos replicated log files. Consider allocating a dedicated disk (preferably SSD) for Mesos replicated log files to ensure optimal storage performance.
-
-nativelogzkgrouppath
+
-nativelogzkgrouppath

ZooKeeper path used for Mesos replicated log quorum discovery.
@@ -102,15 +102,15 @@ other available Mesos replicated log con
Configuration options for the Aurora scheduler backup manager.
-
-backup_interval
+
-backup_interval

The interval on which the scheduler writes local storage backups. The default is every hour.
-
-backup_dir
+
-backup_dir

Directory to write backups to.
-
-maxsavedbackups
+
-maxsavedbackups

Maximum number of backups to retain before deleting the oldest backup(s).
@@ -157,10 +157,10 @@ accomplished by updating the following s
Set -mesos_master_address to a non-existent zk address. This will prevent scheduler from registering with Mesos. E.g.: -mesos_master_address=zk://localhost:2181

-max_registration_delay - set to sufficiently long interval to prevent registration timeout -and as a result scheduler suicide. E.g: -max_registration_delay=360min
-
Make sure -gc_executor_path option is not set to prevent accidental task GC. This is -important as scheduler will attempt to reconcile the cluster state and will kill all tasks when -restarted with an empty Mesos replicated log.
+and as a result scheduler suicide. E.g: -max_registration_delay=360mins +
Make sure -reconciliation_initial_delay option is set high enough (e.g.: 365days) to +prevent accidental task GC. This is important as scheduler will attempt to reconcile the cluster +state and will kill all tasks when restarted with an empty Mesos replicated log.

Restart all schedulers
@@ -172,7 +172,7 @@ restarted with an empty Mesos replicated

Stop schedulers

Delete all files under -native_log_file_path on all schedulers
-
Initialize Mesos replica’s log file: mesos-log initialize <-native_log_file_path>
+
Initialize Mesos replica’s log file: mesos-log initialize --path=<-native_log_file_path>

Restart schedulers

@@ -203,11 +203,11 @@ the provided backup snapshot and initiat
Cleanup

Undo any modification done during Preparation sequence.
+

-

@@ -232,5 +232,6 @@ the provided backup snapshot and initiat

+ \ No newline at end of file Modified: aurora/site/publish/documentation/latest/storage/index.html URL: http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/storage/index.html?rev=1719617&r1=1719616&r2=1719617&view=diff ============================================================================== --- aurora/site/publish/documentation/latest/storage/index.html (original) +++ aurora/site/publish/documentation/latest/storage/index.html Sat Dec 12 01:46:48 2015 @@ -21,12 +21,11 @@ -

-
+

Documentation

Community

Downloads
@@ -34,7 +33,8 @@

-
+ +

@@ -135,11 +135,11 @@ volatile and replicated writes to succee
Any time a scheduler restarts, it restores its volatile state from the most recent position recorded in the replicated log by restoring the snapshot and replaying individual log entries on top to fully recover the state up to the last write.
+

-

@@ -164,5 +164,6 @@ recover the state up to the last write.<

+ \ No newline at end of file Modified: aurora/site/publish/documentation/latest/test-resource-generation/index.html URL: http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/test-resource-generation/index.html?rev=1719617&r1=1719616&r2=1719617&view=diff ============================================================================== --- aurora/site/publish/documentation/latest/test-resource-generation/index.html (original) +++ aurora/site/publish/documentation/latest/test-resource-generation/index.html Sat Dec 12 01:46:48 2015 @@ -21,12 +21,11 @@ -

-
+

Documentation

Community

Downloads
@@ -34,7 +33,8 @@

-
+ +

@@ -46,9 +46,8 @@
The Aurora source repository and distributions contain several binary files to qualify the backwards-compatibility of thermos with checkpoint data. Since -thermos persists state to disk, to be read by other components (the GC executor -and the thermos observer), it is important that we have tests that prevent -regressions affecting the ability to parse previously-written data.
+thermos persists state to disk, to be read by the thermos observer), it is important that we have +tests that prevent regressions affecting the ability to parse previously-written data.

Generating test files
@@ -66,11 +65,11 @@ accomplished by writing and running a job configuration that exercises the feature, and copying the checkpoint file from the sandbox directory, by default this is /var/run/thermos/checkpoints/<aurora task id>.
+

-

@@ -95,5 +94,6 @@ copying the checkpoint file from the san

+ \ No newline at end of file Modified: aurora/site/publish/documentation/latest/thrift-deprecation/index.html URL: http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/thrift-deprecation/index.html?rev=1719617&r1=1719616&r2=1719617&view=diff ============================================================================== --- aurora/site/publish/documentation/latest/thrift-deprecation/index.html (original) +++ aurora/site/publish/documentation/latest/thrift-deprecation/index.html Sat Dec 12 01:46:48 2015 @@ -21,12 +21,11 @@ -

-
+

Documentation

Community

Downloads
@@ -34,7 +33,8 @@

-
+ +

@@ -94,11 +94,11 @@ See this document for more.
+

-

@@ -123,5 +123,6 @@ for more.

+ \ No newline at end of file Modified: aurora/site/publish/documentation/latest/tutorial/index.html URL: http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/tutorial/index.html?rev=1719617&r1=1719616&r2=1719617&view=diff ============================================================================== --- aurora/site/publish/documentation/latest/tutorial/index.html (original) +++ aurora/site/publish/documentation/latest/tutorial/index.html Sat Dec 12 01:46:48 2015 @@ -21,12 +21,11 @@ -

-
+

Documentation

Community

Downloads
@@ -34,7 +33,8 @@

-
+ +

@@ -89,21 +89,22 @@ this directory is the same as /vag -import sys +import sys import time def main(argv): SLEEP_DELAY = 10 # Python ninjas - ignore this blatant bug. for i in xrang(100): - print("Hello world! The time is now: %s. Sleeping for %d secs" % ( + print("Hello world! The time is now: %s. Sleeping for %d secs" % ( time.asctime(), SLEEP_DELAY)) sys.stdout.flush() time.sleep(SLEEP_DELAY) -if __name__ == "__main__": +if __name__ == "__main__": main(sys.argv) - + + Aurora Configuration Once we have our script/program, we need to create a configuration @@ -112,24 +113,24 @@ code in the file hello_world.auror - pkg_path = '/vagrant/hello_world.py' +pkg_path = '/vagrant/hello_world.py' # we use a trick here to make the configuration change with # the contents of the file, for simplicity. in a normal setting, packages would be # versioned, and the version number would be changed in the configuration. import hashlib -with open(pkg_path, 'rb') as f: - pkg_checksum = hashlib.md5(f.read()).hexdigest() +with open(pkg_path, 'rb') as f: + pkg_checksum = hashlib.md5(f.read()).hexdigest() # copy hello_world.py into the local sandbox install = Process( - name = 'fetch_package', - cmdline = 'cp %s . && echo %s && chmod +x hello_world.py' % (pkg_path, pkg_checksum)) + name = 'fetch_package', + cmdline = 'cp %s . && echo %s && chmod +x hello_world.py' % (pkg_path, pkg_checksum)) # run the script hello_world = Process( - name = 'hello_world', - cmdline = 'python hello_world.py') + name = 'hello_world', + cmdline = 'python hello_world.py') # describe the task hello_world_task = SequentialTask( @@ -137,19 +138,20 @@ code in the file hello_world.auror resources = Resources(cpu = 1, ram = 1*MB, disk=8*MB)) jobs = [ - Service(cluster = 'devcluster', - environment = 'devel', - role = 'www-data', - name = 'hello_world', + Service(cluster = 'devcluster', + environment = 'devel', + role = 'www-data', + name = 'hello_world', task = hello_world_task) ] - + + For more about Aurora configuration files, see the Configuration Tutorial and the Aurora + Thermos Reference (preferably after finishing this tutorial). -What’s Going On In That Configuration File? +What’s Going On In That Configuration File? More than you might think. @@ -182,19 +184,22 @@ identical, the job keys identify the sam /etc/aurora/clusters.json within the Aurora scheduler has the available cluster names. For Vagrant, from the top-level of your Aurora repository clone, do: -$ vagrant ssh - +$ vagrant ssh + + Followed by: -vagrant@precise64:~$ cat /etc/aurora/clusters.json - +vagrant@precise64:~$ cat /etc/aurora/clusters.json + + You’ll see something like: -[{ - "name": "devcluster", - "zk": "192.168.33.7", - "scheduler_zk_path": "/aurora/scheduler", - "auth_mechanism": "UNAUTHENTICATED" +[{ + "name": "devcluster", + "zk": "192.168.33.7", + "scheduler_zk_path": "/aurora/scheduler", + "auth_mechanism": "UNAUTHENTICATED" }] - + + Use a name value for your job key’s cluster value. Role names are user accounts existing on the slave machines. If you don’t know what accounts @@ -204,13 +209,15 @@ are available, contact your sysadmin. The Aurora Client command that actually runs our Job is aurora job create. It creates a Job as specified by its job key and configuration file arguments and runs it. -aurora job create <cluster>/<role>/<environment>/<jobname> <config_file> - +aurora job create <cluster>/<role>/<environment>/<jobname> <config_file> + + Or for our example: -aurora job create devcluster/www-data/devel/hello_world /vagrant/hello_world.aurora - +aurora job create devcluster/www-data/devel/hello_world /vagrant/hello_world.aurora + + This returns: -$ vagrant ssh +$ vagrant ssh Welcome to Ubuntu 12.04 LTS (GNU/Linux 3.2.0-23-generic x86_64) * Documentation: https://help.ubuntu.com/ @@ -222,7 +229,8 @@ vagrant@precise64:~$ aurora job create d INFO] Response from scheduler: OK (message: 1 new tasks pending for job www-data/devel/hello_world) INFO] Job url: http://precise64:8081/scheduler/www-data/devel/hello_world - + + Watching the Job Run Now that our job is running, let’s see what it’s doing. Access the @@ -258,8 +266,9 @@ to stderr on the failed It looks like we made a typo in our Python script. We wanted xrange, not xrang. Edit the hello_world.py script to use the correct function and we will try again. -aurora job update devcluster/www-data/devel/hello_world /vagrant/hello_world.aurora - +aurora job update devcluster/www-data/devel/hello_world /vagrant/hello_world.aurora + + This time, the task comes up, we inspect the page, and see that the hello_world process is running. @@ -273,12 +282,13 @@ output: Cleanup Now that we’re done, we kill the job using the Aurora client: -vagrant@precise64:~$ aurora job killall devcluster/www-data/devel/hello_world +vagrant@precise64:~$ aurora job killall devcluster/www-data/devel/hello_world INFO] Killing tasks for job: devcluster/www-data/devel/hello_world INFO] Response from scheduler: OK (message: Tasks killed.) INFO] Job url: http://precise64:8081/scheduler/www-data/devel/hello_world vagrant@precise64:~$ - + + The job page now shows the hello_world tasks as completed. @@ -296,11 +306,11 @@ Thermos work “under the hood&rdquo Explore the Aurora Client - use aurora -h, and read the Aurora Client Commands document. +
- @@ -325,5 +335,6 @@ Thermos work “under the hood&rdquo + \ No newline at end of file Modified: aurora/site/publish/documentation/latest/user-guide/index.html URL: http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/user-guide/index.html?rev=1719617&r1=1719616&r2=1719617&view=diff ============================================================================== --- aurora/site/publish/documentation/latest/user-guide/index.html (original) +++ aurora/site/publish/documentation/latest/user-guide/index.html Sat Dec 12 01:46:48 2015 @@ -21,12 +21,11 @@ - - + Documentation Community Downloads @@ -34,7 +33,8 @@ - + + @@ -244,14 +244,16 @@ from the point where the update failed. The Executor implements a protocol for rudimentary control of a task via HTTP. Tasks subscribe for this protocol by declaring a port named health. Take for example this configuration snippet: -nginx = Process( - name = 'nginx', - cmdline = './run_nginx.sh -port {{thermos.ports[http]}}') - +nginx = Process( + name = 'nginx', + cmdline = './run_nginx.sh -port {{thermos.ports[http]}}') + + When this Process is included in a job, the job will be allocated a port, and the command line will be replaced with something like: -./run_nginx.sh -port 42816 - +./run_nginx.sh -port 42816 + + Where 42816 happens to be the allocated. port. Typically, the Executor monitors Processes within a task only by liveness of the forked process. However, when a health port was allocated, it will also send periodic HTTP health checks. A task requesting a health port must handle the following @@ -398,18 +400,20 @@ about the Aurora Client. Part of the output from creating a new Job is a URL for the Job’s scheduler UI page. For example: - vagrant@precise64:~$ aurora job create devcluster/www-data/prod/hello \ + vagrant@precise64:~$ aurora job create devcluster/www-data/prod/hello \ /vagrant/examples/jobs/hello_world.aurora INFO] Creating job hello INFO] Response from scheduler: OK (message: 1 new tasks pending for job www-data/prod/hello) INFO] Job url: http://precise64:8081/scheduler/www-data/prod/hello - + + The “Job url” goes to the Job’s scheduler UI page. To go to the overall scheduler UI page, stop at the “scheduler” part of the URL, in this case, http://precise64:8081/scheduler You can also reach the scheduler UI page via the Client command aurora job open: - aurora job open [<cluster>[/<role>[/<env>/<job_name>]]] - + aurora job open [<cluster>[/<role>[/<env>/<job_name>]]] + + If only the cluster is specified, it goes directly to that cluster’s scheduler main page. If the role is specified, it goes to the top-level role page. If the full job key is specified, it goes directly to the job page where you can inspect individual tasks. @@ -423,11 +427,11 @@ about the Aurora Client. See client commands. + - @@ -452,5 +456,6 @@ about the Aurora Client. + \ No newline at end of file