aurora-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject aurora git commit: Add support for receiving min_consecutive_successes in health checker
Date Thu, 06 Oct 2016 00:38:37 GMT
Repository: aurora
Updated Branches:
  refs/heads/master 640f07bab -> e91130e49

Add support for receiving min_consecutive_successes in health checker

- Add support for receiving a new HealthCheckConfig attribute
  "min_consecutive_successes" in health checker.
- Add an entry in release note that describes the health check driven update

This patch is related to, in which I added a
new configuration value "min_consecutive_successes" in HealthCheckConfig.

Testing Done:

./pants test.pytest src/test/python/apache/aurora/executor::


Bugs closed: AURORA-894

Reviewed at


Branch: refs/heads/master
Commit: e91130e49445c3933b6e27f5fde18c3a0e61b87a
Parents: 640f07b
Author: Kai Huang <>
Authored: Wed Oct 5 17:38:28 2016 -0700
Committer: Zameer Manji <>
Committed: Wed Oct 5 17:38:28 2016 -0700

----------------------------------------------------------------------                                |  5 ++++
 docs/features/                    | 26 +++++++++++++++++---
 .../apache/aurora/client/api/    |  4 +--
 .../aurora/executor/common/    |  3 ++-
 4 files changed, 32 insertions(+), 6 deletions(-)
diff --git a/ b/
index 97f05d5..6968bb5 100644
--- a/
+++ b/
@@ -2,6 +2,11 @@
 ### New/updated:
+- Aurora scheduler job updater can now rely on health check status rather than `watch_secs`
+  when deciding an individual instance update state. This will potentially speed up updates
as the
+  `minWaitInInstanceRunningMs` will no longer have to be chosen based on the worst observed
+  startup/warmup delay but rather as a desired health check duration.
 - A task's tier is now mapped to a label on the Mesos `TaskInfo` proto.
 ### Deprecations and removals:
diff --git a/docs/features/ b/docs/features/
index 792f2ae..c4ec42e 100644
--- a/docs/features/
+++ b/docs/features/
@@ -49,9 +49,29 @@ and performing these operations:
   the new config instance.
 The Aurora client continues through the instance list until all tasks are
-updated, in `RUNNING,` and healthy for a configurable amount of time.
-If the client determines the update is not going well (a percentage of health
-checks have failed), it cancels the update.
+updated. If the client determines the update is not going well (a percentage
+of health checks have failed), it cancels the update.
+Currently, the scheduler job updater uses two mechanisms to determine when
+to stop monitoring instance update state: a time-based grace interval and health
+check status.
+Job updates with health checks disabled (e.g. no ‘health’ port is defined
+in .aurora portmap) will rely on a time-based grace interval called [watch_secs]
+An instance will start executing task content when reaching `STARTING`
+state. Once the task sandbox is created, the instance is moved into `RUNNING`
+state. Afterward, the job updater will start the watch_secs countdown to ensure
+an instance is healthy, and then complete the update.
+Job updates with health check enabled will rely on health check status. When instance
+reaching `STARTING` state, health checks are performed periodically by the executor
+to ensure the instance is healthy. An instance is moved into `RUNNING` state only if
+a minimum number of consecutive successful health checks are performed
+during the initial warmup period (defined by [initial_interval_secs]
+(../reference/ If watch_secs is
+set as zero, the scheduler job updater will complete the update immediately.
+Otherwise, it will complete the update after the watch_secs expires.
 Update cancellation runs a procedure similar to the described above
 update sequence, but in reverse order. New instance configs are swapped
diff --git a/src/main/python/apache/aurora/client/api/ b/src/main/python/apache/aurora/client/api/
index c649316..ebeddab 100644
--- a/src/main/python/apache/aurora/client/api/
+++ b/src/main/python/apache/aurora/client/api/
@@ -35,8 +35,8 @@ class UpdaterConfig(object):
     if batch_size <= 0:
       raise ValueError('Batch size should be greater than 0')
-    if watch_secs <= 0:
-      raise ValueError('Watch seconds should be greater than 0')
+    if watch_secs < 0:
+      raise ValueError('Watch seconds should not be negative')
     if pulse_interval_secs is not None and pulse_interval_secs < self.MIN_PULSE_INTERVAL_SECONDS:
       raise ValueError('Pulse interval seconds must be at least %s seconds.'
                        % self.MIN_PULSE_INTERVAL_SECONDS)
diff --git a/src/main/python/apache/aurora/executor/common/ b/src/main/python/apache/aurora/executor/common/
index 03fbffd..1e0be10 100644
--- a/src/main/python/apache/aurora/executor/common/
+++ b/src/main/python/apache/aurora/executor/common/
@@ -331,6 +331,7 @@ class HealthCheckerProvider(StatusCheckerProvider):
-      max_consecutive_failures=health_check_config.get('max_consecutive_failures'))
+      max_consecutive_failures=health_check_config.get('max_consecutive_failures'),
+      min_consecutive_successes=health_check_config.get('min_consecutive_successes'))
     return health_checker

View raw message