aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kai Huang <texasred2...@hotmail.com>
Subject Re: Review Request 51876: Modify executor state transition logic to rely on health checks (if enabled)
Date Tue, 20 Sep 2016 00:25:26 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51876/
-----------------------------------------------------------

(Updated Sept. 20, 2016, 12:25 a.m.)


Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.


Changes
-------

Fixed one flaky test for Thermos Executor. Added unit tests for the modified components in
the notification chain.


Summary (updated)
-----------------

Modify executor state transition logic to rely on health checks (if enabled)


Bugs: AURORA-1225
    https://issues.apache.org/jira/browse/AURORA-1225


Repository: aurora


Description (updated)
-------

Modify executor state transition logic to rely on health checks (if enabled).

[Summary]
Executor needs to start executing user content in STARTING and transition to RUNNING when
a successful required number of health checks is reached.

This review contains a series of executor changes that implement the health check driven updates.
It gives more context of the design of this feature.

[Background]
Please see this epic: https://issues.apache.org/jira/browse/AURORA-1225
and the design doc: https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
for more details and background.

[Description]
If health check is enabled on vCurrent executor, the health checker will send a "TASK_RUNNING"
message when a successful required number of health checks is reached within the initial_interval_secs.
On the other hand, a "TASK_FAILED" message was sent if the health checker fails to reach the
required number of health checks within that period, or a maximum number of failed health
check limit is reached after the initital_interval_secs.

If health check is disabled on the vCurrent executor, it will sends "TASK_RUNNING" message
to scheduler after the thermos runner was started. In this scenario, the behavior of vCurrent
executor will be the same as the vPrev executor.

[Change List]
The current change set includes:
1. Removed the status memoization in ChainedStatusChecker.
2. Modified the StatusManager to be edge triggered.
3. Changed the Aurora Executor callback function.
4. Modified the Health Checker and redefined the meaning initial_interval_secs.


Diffs (updated)
-----

  api/src/main/thrift/org/apache/aurora/gen/api.thrift c5765b70501c101f0535b4eed94e9948c36808f9

  src/main/python/apache/aurora/executor/aurora_executor.py ce5ef680f01831cd89fced8969ae3246c7f60cfd

  src/main/python/apache/aurora/executor/common/health_checker.py 5fc845eceac6f0c048d7489fdc4c672b0c609ea0

  src/main/python/apache/aurora/executor/common/status_checker.py 795dae2d6b661fc528d952c2315196d94127961f

  src/main/python/apache/aurora/executor/common/task_info.py 4ef49e30eeb942886d02c1fb448055264f9aa874

  src/main/python/apache/aurora/executor/status_manager.py 228a99a05f339e21cd7e769a42b9b2276e7bc3fc

  src/test/python/apache/aurora/executor/common/test_health_checker.py bb6ea69dd94298c5b8cf4d5f06d06eea7790d66e

  src/test/python/apache/aurora/executor/common/test_status_checker.py 5be1981c8c8e88258456adb21aa3ca7c0aa472a7

  src/test/python/apache/aurora/executor/common/test_task_info.py 0d9cc5cb340d697a887d8e001f23c948f4fa70c7

  src/test/python/apache/aurora/executor/test_status_manager.py ce4679ba1aa7b42cf0115c943d84663030182d23

  src/test/python/apache/aurora/executor/test_thermos_executor.py 0bfe9e931f873c9f804f2ba4012e050e1f9fd24e


Diff: https://reviews.apache.org/r/51876/diff/


Testing
-------

./build-support/jenkins/build.sh

./pants test.pytest src/test/python/apache/aurora/executor::


Thanks,

Kai Huang


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message