aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Khutornenko <ma...@apache.org>
Subject Re: Review Request 51876: Modify executor state transition logic to rely on health checks (if enabled)
Date Wed, 21 Sep 2016 17:34:12 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51876/#review149846
-----------------------------------------------------------




api/src/main/thrift/org/apache/aurora/gen/api.thrift (lines 1218 - 1220)
<https://reviews.apache.org/r/51876/#comment217562>

    What's the motivation behind moving these constants into the thrift layer? FWICT, these
are only used inside the `health_checker.py` and should stay there.



src/main/python/apache/aurora/executor/aurora_executor.py (line 80)
<https://reviews.apache.org/r/51876/#comment217565>

    This does not have to be a global field, does it? It looks like `_register()` only adds
callbacks but nothing ever reads from it.



src/main/python/apache/aurora/executor/aurora_executor.py (lines 184 - 185)
<https://reviews.apache.org/r/51876/#comment217566>

    Are there any guarantees the status_manager will always deal with `ThermosTaskRunner.EXIT_STATE_MAP`
values? The way I see it any unmapped state will end up not calling the `_shutdown()` at all.
Perhaps a safer way could be having an explicit pair of `_running()` to be called only for
RUNNING state and `_shutdown()` act as a default callback for anything else? Curious what
others think here.



src/main/python/apache/aurora/executor/common/health_checker.py (line 140)
<https://reviews.apache.org/r/51876/#comment217570>

    leftover?



src/main/python/apache/aurora/executor/common/health_checker.py (lines 153 - 155)
<https://reviews.apache.org/r/51876/#comment217578>

    This should be moved into the `_maybe_update_failure_count()` to have all decision making
logic in a single place.



src/main/python/apache/aurora/executor/common/health_checker.py (line 225)
<https://reviews.apache.org/r/51876/#comment217582>

    This should be `mesos_pb2.TASK_RUNNING`, right?



src/main/python/apache/aurora/executor/common/health_checker.py (lines 270 - 274)
<https://reviews.apache.org/r/51876/#comment217563>

    This logic needs to be refactored to reduce duplication and reuse what's now in `is_health_check_enabled`
as much as possible. Ideally, we should have the only place we extract `health_checker` and
the like.



src/main/python/apache/aurora/executor/status_manager.py (line 68)
<https://reviews.apache.org/r/51876/#comment217567>

    This is exactly the problem I alluded to above. What if there is no entry for a given
`status` here?


- Maxim Khutornenko


On Sept. 20, 2016, 12:25 a.m., Kai Huang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51876/
> -----------------------------------------------------------
> 
> (Updated Sept. 20, 2016, 12:25 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Bugs: AURORA-1225
>     https://issues.apache.org/jira/browse/AURORA-1225
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Modify executor state transition logic to rely on health checks (if enabled).
> 
> [Summary]
> Executor needs to start executing user content in STARTING and transition to RUNNING
when a successful required number of health checks is reached.
> 
> This review contains a series of executor changes that implement the health check driven
updates. It gives more context of the design of this feature.
> 
> [Background]
> Please see this epic: https://issues.apache.org/jira/browse/AURORA-1225
> and the design doc: https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
for more details and background.
> 
> [Description]
> If health check is enabled on vCurrent executor, the health checker will send a "TASK_RUNNING"
message when a successful required number of health checks is reached within the initial_interval_secs.
On the other hand, a "TASK_FAILED" message was sent if the health checker fails to reach the
required number of health checks within that period, or a maximum number of failed health
check limit is reached after the initital_interval_secs.
> 
> If health check is disabled on the vCurrent executor, it will sends "TASK_RUNNING" message
to scheduler after the thermos runner was started. In this scenario, the behavior of vCurrent
executor will be the same as the vPrev executor.
> 
> [Change List]
> The current change set includes:
> 1. Removed the status memoization in ChainedStatusChecker.
> 2. Modified the StatusManager to be edge triggered.
> 3. Changed the Aurora Executor callback function.
> 4. Modified the Health Checker and redefined the meaning initial_interval_secs.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift c5765b70501c101f0535b4eed94e9948c36808f9

>   src/main/python/apache/aurora/executor/aurora_executor.py ce5ef680f01831cd89fced8969ae3246c7f60cfd

>   src/main/python/apache/aurora/executor/common/health_checker.py 5fc845eceac6f0c048d7489fdc4c672b0c609ea0

>   src/main/python/apache/aurora/executor/common/status_checker.py 795dae2d6b661fc528d952c2315196d94127961f

>   src/main/python/apache/aurora/executor/common/task_info.py 4ef49e30eeb942886d02c1fb448055264f9aa874

>   src/main/python/apache/aurora/executor/status_manager.py 228a99a05f339e21cd7e769a42b9b2276e7bc3fc

>   src/test/python/apache/aurora/executor/common/test_health_checker.py bb6ea69dd94298c5b8cf4d5f06d06eea7790d66e

>   src/test/python/apache/aurora/executor/common/test_status_checker.py 5be1981c8c8e88258456adb21aa3ca7c0aa472a7

>   src/test/python/apache/aurora/executor/common/test_task_info.py 0d9cc5cb340d697a887d8e001f23c948f4fa70c7

>   src/test/python/apache/aurora/executor/test_status_manager.py ce4679ba1aa7b42cf0115c943d84663030182d23

>   src/test/python/apache/aurora/executor/test_thermos_executor.py 0bfe9e931f873c9f804f2ba4012e050e1f9fd24e

> 
> Diff: https://reviews.apache.org/r/51876/diff/
> 
> 
> Testing
> -------
> 
> ./build-support/jenkins/build.sh
> 
> ./pants test.pytest src/test/python/apache/aurora/executor::
> 
> 
> Thanks,
> 
> Kai Huang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message