aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitriy Shirchenko" <cald...@gmail.com>
Subject Re: Review Request 41154: Add support for performing health checks with a shell command.
Date Fri, 11 Dec 2015 07:05:47 GMT


> On Dec. 11, 2015, 1:28 a.m., Joshua Cohen wrote:
> > src/main/python/apache/aurora/client/config.py, lines 98-110
> > <https://reviews.apache.org/r/41154/diff/4/?file=1159632#file1159632line98>
> >
> >     I feel like this validation, while absolutely important on the client, should
also be verified elsewhere... I don't think the Scheduler does anything other than pass this
through to the executor, so probably the executor should validate it when starting the task?
My concern is someone who does not use the client, but uses the API directly to create tasks
would be able to send invalid config and either have undefined health check behavior, no health
check, or just have a mysterious task failure.
> >     
> >     If we performed explicit validation in the executor when creating the health
checker we at least could exit with an explicit reason that makes it clear that the health
check config was invalid.
> 
> Dmitriy Shirchenko wrote:
>     I completely agree, and those are valid concerns. I do feel that since AFAIK all
of the validators suffer from this and it would be cleaner (and possibly safer) to add this
feature in a different diff not to bloat this one.
> 
> Joshua Cohen wrote:
>     I think the main difference here is that today, it's impossible to construct an invalid
HealthCheckConfig, the task would fail immediately on trying to deserialize the executor config
if, say, the value for endpoint were not a String. However, after this change it would be
possible to create a ShellHealthChecker with no command set, which would fail after the initial
interval. Of course, I see your point... in practice this is not much different from configuring
an http checker that is pointed to a non-existent endpoint... the end result would be the
health check failing with an [internal health check error](https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/executor/common/health_checker.py#L96-L99).
>     
>     I'd be ok with filing a ticket and adding a TODO in the executor to fail fast.

https://issues.apache.org/jira/browse/AURORA-1552 and added as a TODO with task reference.


> On Dec. 11, 2015, 1:28 a.m., Joshua Cohen wrote:
> > src/main/python/apache/aurora/executor/common/health_checker.py, line 248
> > <https://reviews.apache.org/r/41154/diff/4/?file=1159638#file1159638line248>
> >
> >     This behavior is slightly changed from the current behavior, where if the user
did not opt in to health checks (by not binding a `health` port), we would return `None`.
> >     
> >     Now if they did not bind a `health` port, but type is `shell` with no command
we'll return an invalidly configured HealthChecker (It will have a ShellHealthCheck with no
cmd set which will eventually cause the executor to fail when health check is run). This ties
in to the need to validate in the executor that I mentioned above.
> >     
> >     Beyond that though, I've found there's been value in being able to configure
a health check, but then, e.g. not have the devel version of your task bind a `health` port,
opting out of health checking for those tasks. I'm not sure if there's a great way to replicate
that behavior for shell health checks (e.g. add an `enabled` field to `HealthCheckConfig`,
seems a bit gross) and if it's a useful enough "feature" to warrant replicating, but I figured
I'd bring it up for discussion regardless.
> 
> Dmitriy Shirchenko wrote:
>     Default behavior when user doesn't specify a health check is the same ie None is
returned as default type is 'http'. If type is changed to 'shell' then client would error.
Are you suggesting that we don't error in that case?
> 
> Joshua Cohen wrote:
>     What I was getting at above (beyond the need to validate, which is really captured
in my previous comment) is with the http checker the user has a way to configure a health
checker and still opt out of health checking by not binding a health port (e.g. for devel
instances). That is not possible with a shell health checker from what I can see. I don't
know that we want to bend over backwards to codify that functionality, but I thought it was
worth discussing.

Yea totally. Thanks for bringing it up!


> On Dec. 11, 2015, 1:28 a.m., Joshua Cohen wrote:
> > src/test/python/apache/aurora/common/BUILD, line 26
> > <https://reviews.apache.org/r/41154/diff/4/?file=1159642#file1159642line26>
> >
> >     I don't think we need this dependency here. The tests under health_check will
be picked up when we run a recrusive target regardless.
> 
> Dmitriy Shirchenko wrote:
>     I think it's needed. 
>     
>     Here are test that run w/the line removed for
>     ./pants test.pytest --no-fast src/test/python/apache/aurora/common:all
>     :
>     ```
>                          src.test.python.apache.aurora.common.test_aurora_job_key   
                    .....   SUCCESS
>                          src.test.python.apache.aurora.common.test_cluster          
                    .....   SUCCESS
>                          src.test.python.apache.aurora.common.test_cluster_option   
                    .....   SUCCESS
>                          src.test.python.apache.aurora.common.test_clusters         
                    .....   SUCCESS
>                          src.test.python.apache.aurora.common.test_pex_version      
                    .....   SUCCESS
>                          src.test.python.apache.aurora.common.test_shellify         
                    .....   SUCCESS
>                          src.test.python.apache.aurora.common.test_transport        
                    .....   SUCCESS
>     ```
>     
>     Maybe you can suggest something else?
> 
> Joshua Cohen wrote:
>     I'm saying that running tests from common/health_check is not expected behavior when
running common:all. If you wanted to run tests from common and all of its descendents you'd
run ./pants test src/test/python/apache/aurora/common::

Ah, thanks! I was doing it wrong. Deleting that line.


> On Dec. 11, 2015, 1:28 a.m., Joshua Cohen wrote:
> > src/test/python/apache/aurora/common/health_check/test_shell.py, line 39
> > <https://reviews.apache.org/r/41154/diff/4/?file=1159645#file1159645line39>
> >
> >     Mind splitting these up into individual test cases?

Done.


> On Dec. 11, 2015, 1:28 a.m., Joshua Cohen wrote:
> > docs/configuration-reference.md, line 397
> > <https://reviews.apache.org/r/41154/diff/4/?file=1159631#file1159631line397>
> >
> >     "An alternative to HTTP health checking. Specifies a shell command that will
be executed. Any non-zero exit status will be interpreted as a health check failure."

Done.


> On Dec. 11, 2015, 1:28 a.m., Joshua Cohen wrote:
> > docs/configuration-reference.md, line 396
> > <https://reviews.apache.org/r/41154/diff/4/?file=1159631#file1159631line396>
> >
> >     s/that tolerated/that will be tolerated

Done.


> On Dec. 11, 2015, 1:28 a.m., Joshua Cohen wrote:
> > docs/configuration-reference.md, line 394
> > <https://reviews.apache.org/r/41154/diff/4/?file=1159631#file1159631line394>
> >
> >     s/an/a

Done.


- Dmitriy


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41154/#review109901
-----------------------------------------------------------


On Dec. 11, 2015, 2:36 a.m., Dmitriy Shirchenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41154/
> -----------------------------------------------------------
> 
> (Updated Dec. 11, 2015, 2:36 a.m.)
> 
> 
> Review request for Aurora, Maxim Khutornenko, Bill Farner, and Zameer Manji.
> 
> 
> Bugs: AURORA-1551
>     https://issues.apache.org/jira/browse/AURORA-1551
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Adding support for non-HTTP health checks.
> 
> 
> Diffs
> -----
> 
>   3rdparty/python/requirements.txt cfef18ee66b0f92d83c53dacb6d9376fc2e50445 
>   docs/configuration-reference.md 364292998bebb233d300fe59c9ea42b216deee81 
>   src/main/python/apache/aurora/client/config.py 2fc12559016d406c347adb416a5166cca31c961e

>   src/main/python/apache/aurora/common/BUILD 5fce3d0d29d2a38c6563b4d9be963532e595ee19

>   src/main/python/apache/aurora/common/health_check/__init__.py PRE-CREATION 
>   src/main/python/apache/aurora/common/health_check/shell.py PRE-CREATION 
>   src/main/python/apache/aurora/common/http_signaler.py a3193f3259276ec23d37f45839afe3c387cff6b1

>   src/main/python/apache/aurora/config/schema/base.py 398f737bed9ef02ce4a5636896d6587bce26501e

>   src/main/python/apache/aurora/executor/common/health_checker.py 03fdf0afef120c365c6ffad09e152780eed7e351

>   src/main/python/apache/aurora/executor/http_lifecycle.py 6d578cceb56375425ccac1cbfbbcd0add60f20e9

>   src/test/python/apache/aurora/client/BUILD 84c5c845d9b1c8078ae8b47242d0f2ffc00ef6dc

>   src/test/python/apache/aurora/client/test_config.py b1a3c1865819899ef19173be0f861783a2631d0a

>   src/test/python/apache/aurora/common/BUILD 2556c32842b3cf7040cb3c41172a0d9c365cb649

>   src/test/python/apache/aurora/common/health_check/BUILD PRE-CREATION 
>   src/test/python/apache/aurora/common/health_check/__init__.py PRE-CREATION 
>   src/test/python/apache/aurora/common/health_check/test_shell.py PRE-CREATION 
>   src/test/python/apache/aurora/common/test_http_signaler.py f68c71a6765f7f0b93c8c50662515b5742344f35

>   src/test/python/apache/aurora/executor/common/test_health_checker.py 27c71711d52f757ed1552db4accda671a6bdafdd

> 
> Diff: https://reviews.apache.org/r/41154/diff/
> 
> 
> Testing
> -------
> 
> Added unit tests. 
> Ran e2e test.
> Tested expected behavior on virtual Mesos cluster.
> 
> 
> Thanks,
> 
> Dmitriy Shirchenko
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message