aurora-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject incubator-aurora git commit: Documenting coordinated updates.
Date Wed, 18 Feb 2015 00:41:08 GMT
Repository: incubator-aurora
Updated Branches:
  refs/heads/master 1a9551134 -> 4b43305b3

Documenting coordinated updates.

Bugs closed: AURORA-1012

Reviewed at


Branch: refs/heads/master
Commit: 4b43305b33cd8bebdd80225a3987b7cc7a8389a2
Parents: 1a95511
Author: Maxim Khutornenko <>
Authored: Tue Feb 17 16:40:58 2015 -0800
Committer: Maxim Khutornenko <>
Committed: Tue Feb 17 16:40:58 2015 -0800

 docs/         | 24 +++++++++++++++++++++++-
 docs/ |  3 +++
 2 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/docs/ b/docs/
index a1b84fa..f0769b5 100644
--- a/docs/
+++ b/docs/
@@ -11,6 +11,7 @@ Aurora Client Commands
     - [Killing a Job](#killing-a-job)
     - [Updating a Job](#updating-a-job)
         - [Asynchronous job updates (beta)](#user-content-asynchronous-job-updates-beta)
+            - [Coordinated job updates (beta)](#user-content-coordinated-job-updates-beta)
     - [Renaming a Job](#renaming-a-job)
     - [Restarting Jobs](#restarting-jobs)
 - [Cron Jobs](#cron-jobs)
@@ -194,7 +195,7 @@ used to define and activate hooks for `job update`.
 #### Asynchronous job updates (beta)
-As of 0.6.0, Aurora will coordinate updates (and rollbacks) within the
+As of 0.6.0, Aurora will control and dispatch updates (and rollbacks) within the
 scheduler. Performing updates this way also allows the scheduler to display
 update progress and job update history in the browser.
@@ -222,6 +223,27 @@ You may `abort` a job update regardless of the state it is in. This will
 instruct the scheduler to completely abandon the job update and leave the job
 in the current (possibly partially-updated) state.
+##### Coordinated job updates (beta)
+Some Aurora services may benefit from having more control over the
+[asynchronous scheduler updater](#user-content-asynchronous-job-updates-beta) by explicitly
+acknowledging ("heartbeating") job update progress. This may be helpful for mission-critical
+service updates where explicit job health monitoring is vital during the entire job update
+lifecycle. Such job updates would rely on an external service (or a custom client) periodically
+pulsing an active coordinated job update via a
+[pulseJobUpdate RPC](../api/src/main/thrift/org/apache/aurora/gen/api.thrift).
+A coordinated update is defined by setting a positive
+[pulse_interval_secs]( value in job configuration
+file. If no pulses are received within specified interval the update will be blocked. A blocked
+update is unable to continue rolling forward (or rolling back) but retains its active status.
+It may only be unblocked by a fresh `pulseJobUpdate` call.
+NOTE: A coordinated update starts in `ROLL_FORWARD_AWAITING_PULSE` state and will not make
+progress until the first pulse arrives. However, a paused update (`ROLL_FORWARD_PAUSED` or
+`ROLL_BACK_PAUSED`) is still considered active and upon resuming will immediately make progress
+provided the pulse interval has not expired.
 ### Renaming a Job
 Renaming is a tricky operation as downstream clients must be informed of
diff --git a/docs/ b/docs/
index ee17591..0da4a9b 100644
--- a/docs/
+++ b/docs/
@@ -347,6 +347,9 @@ Parameters for controlling the rate and policy of rolling updates.
 | ```watch_secs```             | Integer  | Minimum number of seconds a shard must remain
in ```RUNNING``` state before considered a success (Default: 45)
 | ```max_per_shard_failures``` | Integer  | Maximum number of restarts per shard during update.
Increments total failure count when this limit is exceeded. (Default: 0)
 | ```max_total_failures```     | Integer  | Maximum number of shard failures to be tolerated
in total during an update. Cannot be greater than or equal to the total number of tasks in
a job. (Default: 0)
+| ```rollback_on_failure```    | boolean  | When False, prevents auto rollback of a failed
update (Default: True)
+| ```wait_for_batch_completion```| boolean | When True, all threads from a given batch will
be blocked from picking up new instances until the entire batch is updated. This essentially
simulates the legacy sequential updater algorithm. (Default: False)
+| ```pulse_interval_secs```    | Integer  |  Indicates a [coordinated update](
If no pulses are received within the provided interval the update will be blocked. Beta-updater
only. Will fail on submission when used with client updater. (Default: None)
 ### HealthCheckConfig Objects

View raw message