aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kai Huang <>
Subject Re: Review Request 51876: Modify executor state transition logic to rely on health checks (if enabled)
Date Tue, 27 Sep 2016 22:25:42 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated Sept. 27, 2016, 10:25 p.m.)

Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.


Exported a new metrics "consecutive_successes" in health checker. Some code clean up.

Bugs: AURORA-1225

Repository: aurora


Modify executor state transition logic to rely on health checks (if enabled).

Executor needs to start executing user content in STARTING and transition to RUNNING when
a successful required number of health checks is reached.

This review contains a series of executor changes that implement the health check driven updates.
It gives more context of the design of this feature.

Please see this epic:
and the design doc:
for more details and background.

If health check is enabled on vCurrent executor, the health checker will send a "TASK_RUNNING"
message when a successful required number of health checks is reached within the initial_interval_secs.
On the other hand, a "TASK_FAILED" message was sent if the health checker fails to reach the
required number of health checks within that period, or a maximum number of failed health
check limit is reached after the initital_interval_secs.

If health check is disabled on the vCurrent executor, it will sends "TASK_RUNNING" message
to scheduler after the thermos runner was started. In this scenario, the behavior of vCurrent
executor will be the same as the vPrev executor.

[Change List]
The current change set includes:
1. Removed the status memoization in ChainedStatusChecker.
2. Modified the StatusManager to be edge triggered.
3. Changed the Aurora Executor callback function.
4. Modified the Health Checker and redefined the meaning initial_interval_secs.

Diffs (updated)

  src/main/python/apache/aurora/executor/ ce5ef680f01831cd89fced8969ae3246c7f60cfd

  src/main/python/apache/aurora/executor/common/ 5fc845eceac6f0c048d7489fdc4c672b0c609ea0

  src/main/python/apache/aurora/executor/common/ 795dae2d6b661fc528d952c2315196d94127961f

  src/main/python/apache/aurora/executor/ 228a99a05f339e21cd7e769a42b9b2276e7bc3fc

  src/test/python/apache/aurora/executor/common/ bb6ea69dd94298c5b8cf4d5f06d06eea7790d66e

  src/test/python/apache/aurora/executor/common/ 5be1981c8c8e88258456adb21aa3ca7c0aa472a7

  src/test/python/apache/aurora/executor/ ce4679ba1aa7b42cf0115c943d84663030182d23

  src/test/python/apache/aurora/executor/ 0bfe9e931f873c9f804f2ba4012e050e1f9fd24e




./pants test.pytest src/test/python/apache/aurora/executor::


Kai Huang

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message