aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kai Huang <texasred2...@hotmail.com>
Subject Re: Review Request 51876: @ReviewBot retry Modify executor state transition logic to rely on health checks (if enabled)
Date Wed, 14 Sep 2016 17:02:52 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51876/#review148927
-----------------------------------------------------------




src/main/python/apache/aurora/executor/common/health_checker.py (line 137)
<https://reviews.apache.org/r/51876/#comment216446>

    It seems to me the order doesn't matter that much. However, I still have the feeling that
Maxim's proposal makes the expiration and failure count more close to their true definition,
and less ambuiguous. The tests should not dictate how we write the source code. Sometimes
we have to refactor them.
    
    Please feel free to weigh in.


- Kai Huang


On Sept. 14, 2016, 4:43 a.m., Kai Huang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51876/
> -----------------------------------------------------------
> 
> (Updated Sept. 14, 2016, 4:43 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Bugs: AURORA-1225
>     https://issues.apache.org/jira/browse/AURORA-1225
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Modify executor state transition logic to rely on health checks (if enabled).
> 
> [Summary]
> Executor needs to start executing user content in STARTING and transition to RUNNING
when a successful required number of health checks is reached.
> 
> This review contains a series of executor changes that implement the health check driven
updates. It gives more context of the design of this feature.
> 
> [Background]
> Please see this epic: https://issues.apache.org/jira/browse/AURORA-1225
> and the design doc: https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
for more details and background.
> 
> [Description]
> If health check is enabled on vCurrent executor, the health checker will send a "TASK_RUNNING"
message when a successful required number of health checks is reached within the initial_interval_secs.
On the other hand, a "TASK_FAILED" message was sent if the health checker fails to reach the
required number of health checks within that period, or a maximum number of failed health
check limit is reached after the initital_interval_secs.
> 
> If health check is disabled on the vCurrent executor, it will sends "TASK_RUNNING" message
to scheduler after the thermos runner was started. In this scenario, the behavior of vCurrent
executor will be the same as the vPrev executor.
> 
> [Change List]
> The current change set includes:
> 1. Removed the status memoization in ChainedStatusChecker.
> 2. Modified the StatusManager to be edge triggered.
> 3. Changed the Aurora Executor callback function.
> 4. Modified the Health Checker and redefined the meaning initial_interval_secs.
> 
> [TODOs]
> Currently I fixed all broken tests caused by my changes. However, more tests needs to
be added in order to accomodate the executor change. I will send follow-up review update when
I cover more edge cases. But any feedback on the implementation and tests is greatly appreciated.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py ce5ef680f01831cd89fced8969ae3246c7f60cfd

>   src/main/python/apache/aurora/executor/common/health_checker.py 5fc845eceac6f0c048d7489fdc4c672b0c609ea0

>   src/main/python/apache/aurora/executor/common/status_checker.py 795dae2d6b661fc528d952c2315196d94127961f

>   src/main/python/apache/aurora/executor/status_manager.py 228a99a05f339e21cd7e769a42b9b2276e7bc3fc

>   src/test/python/apache/aurora/executor/common/test_health_checker.py bb6ea69dd94298c5b8cf4d5f06d06eea7790d66e

>   src/test/python/apache/aurora/executor/common/test_status_checker.py 5be1981c8c8e88258456adb21aa3ca7c0aa472a7

>   src/test/python/apache/aurora/executor/test_status_manager.py ce4679ba1aa7b42cf0115c943d84663030182d23

>   src/test/python/apache/aurora/executor/test_thermos_executor.py 0bfe9e931f873c9f804f2ba4012e050e1f9fd24e

> 
> Diff: https://reviews.apache.org/r/51876/diff/
> 
> 
> Testing
> -------
> 
> ./build-support/jenkins/build.sh
> 
> ./pants test.pytest src/test/python/apache/aurora/executor::
> 
> 
> Thanks,
> 
> Kai Huang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message