aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joshua Cohen <jco...@apache.org>
Subject Re: Review Request 51876: Modify executor state transition logic to rely on health checks (if enabled)
Date Fri, 23 Sep 2016 16:53:54 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51876/#review150200
-----------------------------------------------------------




src/main/python/apache/aurora/executor/aurora_executor.py (line 81)
<https://reviews.apache.org/r/51876/#comment218071>

    I don't think we should expose this simply for the sake of testing, besides breaking abstractions,
it's also brittle. A bug could set this value to `True` even if the behavior we want to test
is broken.
    
    We should be able to test the health check logic directly based on the behavior of the
executor. I.e. assert that the `TASK_RUNNING` update is not sent right away if one of the
status providers is a health checker.



src/main/python/apache/aurora/executor/aurora_executor.py (lines 122 - 133)
<https://reviews.apache.org/r/51876/#comment218069>

    I don't think we need to move all of this logic from `_start_status_manager`. We only
need to move the creation of the status providers from `self._status_providers` here.
    
    The rest (registering them with metrics, adding `self._kill_manager` to the list) can
remain in `_start_status_manager`.



src/main/python/apache/aurora/executor/aurora_executor.py (line 183)
<https://reviews.apache.org/r/51876/#comment218073>

    Not related to your change, but the `driver` argument is unused here. Can you drop it?



src/main/python/apache/aurora/executor/common/health_checker.py (line 36)
<https://reviews.apache.org/r/51876/#comment218077>

    kill this blank line?



src/main/python/apache/aurora/executor/common/health_checker.py (line 76)
<https://reviews.apache.org/r/51876/#comment218083>

    This is the oppositve of `current_consecutive_failures`, right? I.e. it's not the total
number of healthchecks that have been performed, but instead the number that have passed,
yes?
    
    If so, consider renaming to `current_consecutive_successes` for symmetry.



src/main/python/apache/aurora/executor/common/health_checker.py (lines 219 - 220)
<https://reviews.apache.org/r/51876/#comment218085>

    Fits on one line.



src/main/python/apache/aurora/executor/status_manager.py (line 50)
<https://reviews.apache.org/r/51876/#comment218076>

    Why pass these as a tuple instead of individual args?


- Joshua Cohen


On Sept. 23, 2016, 12:57 a.m., Kai Huang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51876/
> -----------------------------------------------------------
> 
> (Updated Sept. 23, 2016, 12:57 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Bugs: AURORA-1225
>     https://issues.apache.org/jira/browse/AURORA-1225
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Modify executor state transition logic to rely on health checks (if enabled).
> 
> [Summary]
> Executor needs to start executing user content in STARTING and transition to RUNNING
when a successful required number of health checks is reached.
> 
> This review contains a series of executor changes that implement the health check driven
updates. It gives more context of the design of this feature.
> 
> [Background]
> Please see this epic: https://issues.apache.org/jira/browse/AURORA-1225
> and the design doc: https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
for more details and background.
> 
> [Description]
> If health check is enabled on vCurrent executor, the health checker will send a "TASK_RUNNING"
message when a successful required number of health checks is reached within the initial_interval_secs.
On the other hand, a "TASK_FAILED" message was sent if the health checker fails to reach the
required number of health checks within that period, or a maximum number of failed health
check limit is reached after the initital_interval_secs.
> 
> If health check is disabled on the vCurrent executor, it will sends "TASK_RUNNING" message
to scheduler after the thermos runner was started. In this scenario, the behavior of vCurrent
executor will be the same as the vPrev executor.
> 
> [Change List]
> The current change set includes:
> 1. Removed the status memoization in ChainedStatusChecker.
> 2. Modified the StatusManager to be edge triggered.
> 3. Changed the Aurora Executor callback function.
> 4. Modified the Health Checker and redefined the meaning initial_interval_secs.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py ce5ef680f01831cd89fced8969ae3246c7f60cfd

>   src/main/python/apache/aurora/executor/common/health_checker.py 5fc845eceac6f0c048d7489fdc4c672b0c609ea0

>   src/main/python/apache/aurora/executor/common/status_checker.py 795dae2d6b661fc528d952c2315196d94127961f

>   src/main/python/apache/aurora/executor/status_manager.py 228a99a05f339e21cd7e769a42b9b2276e7bc3fc

>   src/test/python/apache/aurora/executor/common/test_health_checker.py bb6ea69dd94298c5b8cf4d5f06d06eea7790d66e

>   src/test/python/apache/aurora/executor/common/test_status_checker.py 5be1981c8c8e88258456adb21aa3ca7c0aa472a7

>   src/test/python/apache/aurora/executor/test_status_manager.py ce4679ba1aa7b42cf0115c943d84663030182d23

>   src/test/python/apache/aurora/executor/test_thermos_executor.py 0bfe9e931f873c9f804f2ba4012e050e1f9fd24e

> 
> Diff: https://reviews.apache.org/r/51876/diff/
> 
> 
> Testing
> -------
> 
> ./build-support/jenkins/build.sh
> 
> ./pants test.pytest src/test/python/apache/aurora/executor::
> 
> 
> Thanks,
> 
> Kai Huang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message