aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kai Huang <texasred2...@hotmail.com>
Subject Re: Review Request 51742: Modify the callback function passed to StatusManager
Date Fri, 09 Sep 2016 00:18:29 GMT


> On Sept. 8, 2016, 9:13 p.m., Joshua Cohen wrote:
> > src/main/python/apache/aurora/executor/aurora_executor.py, lines 204-211
> > <https://reviews.apache.org/r/51742/diff/1/?file=1494643#file1494643line204>
> >
> >     I see a couple of problems with this method:
> >     
> >     1. The `status_result` argument passed in by `StatusManager#run` is not a `TaskState`,
it's a [StatusResult](https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/executor/common/status_checker.py#L23-L50)
which has a `status` property whose value is a `TaskState`. So I believe your first check
will always fail in a real world scenario?
> >     2. If we receive multiple invocations of this callback for healthy tasks, the
second callback will result in the healthy task being killed (since `self.task_running` will
be `True`, the initial condition will evaluate to `False` and we'll land in the shutdown branch).
> >     
> >     These two issues combined make me wary of shipping these changes without the
corresponding changes further down the stack, as it makes it hard to evaluate their actual
effect. I appreciate trying to break the review up into manageable chunks, but in this case
I think it's having the opposite effect of causing reviewers to have to envision the contents
of future reviews to understand the impact of the changes in this review. On top of that,
the doubling up of sending the `TASK_RUNNING` update while the code is in the in-between state
also makes me nervous.
> >     
> >     We also definitely need test coverage for this callback.
> 
> Kai Huang wrote:
>     Thanks for the valid feedback. I'll work on them.
> 
> Kai Huang wrote:
>     Given that _status_result_handler is an instance method of Aurora executor, the unit
test of _status_result_handler is kind of interwined in the following tests in test_thermos_executor.py:
>     test_shutdown
>     test_task_health_check_ok
>     test_task_health_failed, etc.
>     
>     The complexity results from the attribute "task_running" in Aurora Executor, which
makes it hard to unit test this function. I'll think about if there is other way than relying
on the "task_running" field in aurora_executor.

Since the major blocker here is that we don't want to trigger the callback to handle TASK_RUNNING
status more than once, I suggest delegating the "check once" logic from Aurora Executor into
status manager, because that will make the unit tests much easier.


- Kai


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51742/#review148258
-----------------------------------------------------------


On Sept. 8, 2016, 8:07 p.m., Kai Huang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51742/
> -----------------------------------------------------------
> 
> (Updated Sept. 8, 2016, 8:07 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Bugs: AURORA-1225
>     https://issues.apache.org/jira/browse/AURORA-1225
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Modify the callback function passed to StatusManager to handle more StatusResult.
> 
> Currently, Aurora executor passes a callback to a status manager. The status manager
periodically polls the status of HealthChecker. When a non-empty status is deteted, the callback
function will shut down the status manager. 
> 
> In health check driven updates, the health checker will change its status to "TASK_RUNNING"
when a successful required number of health checks is reached. Therefore, we need to modify
the callback function, so that it won't shutdown the status manager if the task is in non-exit
state like "TASK_RUNNING".
> 
> Please see this epic: https://issues.apache.org/jira/browse/AURORA-1225
> and the design doc: https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
for more details and background.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py ce5ef680f01831cd89fced8969ae3246c7f60cfd

>   src/main/python/apache/aurora/executor/status_manager.py 228a99a05f339e21cd7e769a42b9b2276e7bc3fc

>   src/test/python/apache/aurora/executor/test_status_manager.py ce4679ba1aa7b42cf0115c943d84663030182d23

> 
> Diff: https://reviews.apache.org/r/51742/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew build
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message