aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Santhosh Kumar Shanmugham <santhoshkuma...@gmail.com>
Subject Re: Review Request 53590: Change job updates to rely on `health-checks` rather than on `watch_secs`.
Date Thu, 17 Nov 2016 00:07:11 GMT


> On Nov. 16, 2016, 2:44 p.m., Zameer Manji wrote:
> > src/main/python/apache/aurora/executor/aurora_executor.py, line 118
> > <https://reviews.apache.org/r/53590/diff/5/?file=1565457#file1565457line118>
> >
> >     The only objection I have to this code is how we trigger `TASK_RUNNING`. The
> >     more I stare at this the more confusing I find it.
> >     
> >     It really seems like a hack to add `callback` to the status checker interface
> >     and rely on the fact that only the health checker implementation will call the
> >     call back.
> >     
> >     What if we get another status checker that may control if a task should
> >     transition to RUNNING or not?
> >     
> >     This also seems like an abstraction leak. The executor should be controlling
the
> >     state of the task and not the status checkers.
> >     
> >     Have you considered another approach where we instead signal running through
the
> >     `status` method of the status checker?
> >     
> >     Perhaps the return value can be changed from `None` for don't care, `RUNNING`
to
> >     signal running right away, `STARTING` to block progress to `RUNNING`, and
> >     everything else to trigger failure?
> >     
> >     Then instead of deleting this code here, we just poll the status checkers to
see
> >     if they all have consensus if we should enter RUNNING.
> >     
> >     So if they all return `None` we go to RUNNING. If all or some return RUNNING
and
> >     the rest return `None` we go to RUNNING. If one returns STARTING we poll until
> >     we reach the previous states.
> >     
> >     What do you think about that?
> 
> Zameer Manji wrote:
>     For example, we might make the `Announcer` prevent a task go into `RUNNING` unless
it is sucessfully announced.
> 
> Zameer Manji wrote:
>     Also, I know this diverges from the design doc, but after looking at the code I feel
we should step back a bit and reconsider. This is my bad for not raising this concern earlier
in the design doc.

I do like the idea of leveraging `status()` to indicate when to transition to `RUNNING`, without
breaking the `StatusCheckerProvider` abstraction. 

However the `AuroraExecutor` should not be doing the polling. `StatusManager` is already setup
to poll for status and trigger the `shutdown` callback when the task is unhealthy. We can
modify it to take `signal_running`. Note: `ChainedStatusChecker` will also need to be modified
to use the consensus logic.

Let me know what you think.


- Santhosh Kumar


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53590/#review156131
-----------------------------------------------------------


On Nov. 16, 2016, 1:38 p.m., Santhosh Kumar Shanmugham wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53590/
> -----------------------------------------------------------
> 
> (Updated Nov. 16, 2016, 1:38 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Bugs: AURORA-1225
>     https://issues.apache.org/jira/browse/AURORA-1225
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Make RUNNING a first class state to indicate that the task is running
> and is healthy. It is achieved by introducing a new configuration
> parameter `min_consecutive_successes`, which will dictate when to move
> a task into RUNNING state.
> 
> With this change, it is possible to set the `watch_secs` to 0, so that
> updates are purely based on the task's health, rather than relying on
> watching the task to in RUNNING state for a pre-determined timeout.
> 
> 
> Diffs
> -----
> 
>   RELEASE-NOTES.md 392496284b7f046a5dd29a5a3f175305c28a5afe 
>   docs/development/design-documents.md 6bfc679c841ef1b0861758ccc12a5150f1bdb5e3 
>   docs/features/job-updates.md 792f2ae5fd14b1ea9af8be000629ce5a7fc2fe8f 
>   docs/features/services.md b6bfc9de550a714b4243cf033e9f3f03ae676af2 
>   docs/reference/configuration.md f2a0b1873f31e91f3bf0cac6f8448e8130fae688 
>   docs/reference/task-lifecycle.md 4dcb48149e156ef1927e63f5a9fa6b53b174e755 
>   src/main/python/apache/aurora/client/api/updater_util.py c649316edb876565c92cc90c9f030e153c008924

>   src/main/python/apache/aurora/client/config.py 0186af52f0d7d7e3981ec59bf6a01aafee2bcfb1

>   src/main/python/apache/aurora/config/schema/base.py 845163043b0b7b2f9e7aca14677ca9f094658551

>   src/main/python/apache/aurora/executor/aurora_executor.py aee5e56ae66ec7a6acbe97d94d6187fd8646ec9a

>   src/main/python/apache/aurora/executor/common/announcer.py 12d8e0af95d4c24c1b7a3b07d01b59cec4a15894

>   src/main/python/apache/aurora/executor/common/health_checker.py 3c7c09d173c4adbd89eddaee22cffcdd8d268378

>   src/main/python/apache/aurora/executor/common/resource_manager.py b7dc40d8973ec2e5998ab4f6ff988051a70bb1ab

>   src/main/python/apache/aurora/executor/common/status_checker.py 795dae2d6b661fc528d952c2315196d94127961f

>   src/test/python/apache/aurora/client/cli/test_inspect.py 7ef682d010660c9429d94d881dfd42d94f0ce5fd

>   src/test/python/apache/aurora/client/test_config.py 5cf68a5145ddf9478baa30453c0bcb73136fa7eb

>   src/test/python/apache/aurora/executor/common/test_announcer.py 58ca3a8d1ee248f050bab3d10d465c3260c15df3

>   src/test/python/apache/aurora/executor/common/test_health_checker.py da0c56ca084b65427a6122f22f251af8637772d1

>   src/test/python/apache/aurora/executor/common/test_resource_manager_integration.py
fe74bd1d36666ecd89fca1b5b2251202cbbc0f24 
>   src/test/python/apache/aurora/executor/test_thermos_executor.py 3f82165333665d127e1ace765a7dada7348a4b91

>   src/test/sh/org/apache/aurora/e2e/http/http_example.aurora c9dae28b83d751cfdc7c09dc2d83ac3ee4852720

>   src/test/sh/org/apache/aurora/e2e/http/http_example_bad_healthcheck.aurora b85ace4bc33bda9483045732024302eb0725c89f

>   src/test/sh/org/apache/aurora/e2e/http/http_example_updated.aurora b3caa4186217dbf17282a5780791ce3157f4389b

>   src/test/sh/org/apache/aurora/e2e/http_example.py 675ece85e2cf8e45f2c6e079e39b0e095075a3fa

>   src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 9cc6cec6633ee5973d8c9faa98ae24ab10cbeca4

> 
> Diff: https://reviews.apache.org/r/53590/diff/
> 
> 
> Testing
> -------
> 
> buils-support/jenkins/build.sh
> sh ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message