aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua Cohen" <jco...@apache.org>
Subject Re: Review Request 41154: Add support for performing health checks with a shell command.
Date Fri, 11 Dec 2015 04:27:19 GMT


> On Dec. 11, 2015, 1:28 a.m., Joshua Cohen wrote:
> > src/test/python/apache/aurora/common/BUILD, line 26
> > <https://reviews.apache.org/r/41154/diff/4/?file=1159642#file1159642line26>
> >
> >     I don't think we need this dependency here. The tests under health_check will
be picked up when we run a recrusive target regardless.
> 
> Dmitriy Shirchenko wrote:
>     I think it's needed. 
>     
>     Here are test that run w/the line removed for
>     ./pants test.pytest --no-fast src/test/python/apache/aurora/common:all
>     :
>     ```
>                          src.test.python.apache.aurora.common.test_aurora_job_key   
                    .....   SUCCESS
>                          src.test.python.apache.aurora.common.test_cluster          
                    .....   SUCCESS
>                          src.test.python.apache.aurora.common.test_cluster_option   
                    .....   SUCCESS
>                          src.test.python.apache.aurora.common.test_clusters         
                    .....   SUCCESS
>                          src.test.python.apache.aurora.common.test_pex_version      
                    .....   SUCCESS
>                          src.test.python.apache.aurora.common.test_shellify         
                    .....   SUCCESS
>                          src.test.python.apache.aurora.common.test_transport        
                    .....   SUCCESS
>     ```
>     
>     Maybe you can suggest something else?

I'm saying that running tests from common/health_check is not expected behavior when running
common:all. If you wanted to run tests from common and all of its descendents you'd run ./pants
test src/test/python/apache/aurora/common::


> On Dec. 11, 2015, 1:28 a.m., Joshua Cohen wrote:
> > src/main/python/apache/aurora/client/config.py, lines 98-110
> > <https://reviews.apache.org/r/41154/diff/4/?file=1159632#file1159632line98>
> >
> >     I feel like this validation, while absolutely important on the client, should
also be verified elsewhere... I don't think the Scheduler does anything other than pass this
through to the executor, so probably the executor should validate it when starting the task?
My concern is someone who does not use the client, but uses the API directly to create tasks
would be able to send invalid config and either have undefined health check behavior, no health
check, or just have a mysterious task failure.
> >     
> >     If we performed explicit validation in the executor when creating the health
checker we at least could exit with an explicit reason that makes it clear that the health
check config was invalid.
> 
> Dmitriy Shirchenko wrote:
>     I completely agree, and those are valid concerns. I do feel that since AFAIK all
of the validators suffer from this and it would be cleaner (and possibly safer) to add this
feature in a different diff not to bloat this one.

I think the main difference here is that today, it's impossible to construct an invalid HealthCheckConfig,
the task would fail immediately on trying to deserialize the executor config if, say, the
value for endpoint were not a String. However, after this change it would be possible to create
a ShellHealthChecker with no command set, which would fail after the initial interval. Of
course, I see your point... in practice this is not much different from configuring an http
checker that is pointed to a non-existent endpoint... the end result would be the health check
failing with an [internal health check error](https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/executor/common/health_checker.py#L96-L99).

I'd be ok with filing a ticket and adding a TODO in the executor to fail fast.


> On Dec. 11, 2015, 1:28 a.m., Joshua Cohen wrote:
> > src/main/python/apache/aurora/executor/common/health_checker.py, line 248
> > <https://reviews.apache.org/r/41154/diff/4/?file=1159638#file1159638line248>
> >
> >     This behavior is slightly changed from the current behavior, where if the user
did not opt in to health checks (by not binding a `health` port), we would return `None`.
> >     
> >     Now if they did not bind a `health` port, but type is `shell` with no command
we'll return an invalidly configured HealthChecker (It will have a ShellHealthCheck with no
cmd set which will eventually cause the executor to fail when health check is run). This ties
in to the need to validate in the executor that I mentioned above.
> >     
> >     Beyond that though, I've found there's been value in being able to configure
a health check, but then, e.g. not have the devel version of your task bind a `health` port,
opting out of health checking for those tasks. I'm not sure if there's a great way to replicate
that behavior for shell health checks (e.g. add an `enabled` field to `HealthCheckConfig`,
seems a bit gross) and if it's a useful enough "feature" to warrant replicating, but I figured
I'd bring it up for discussion regardless.
> 
> Dmitriy Shirchenko wrote:
>     Default behavior when user doesn't specify a health check is the same ie None is
returned as default type is 'http'. If type is changed to 'shell' then client would error.
Are you suggesting that we don't error in that case?

What I was getting at above (beyond the need to validate, which is really captured in my previous
comment) is with the http checker the user has a way to configure a health checker and still
opt out of health checking by not binding a health port (e.g. for devel instances). That is
not possible with a shell health checker from what I can see. I don't know that we want to
bend over backwards to codify that functionality, but I thought it was worth discussing.


- Joshua


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41154/#review109901
-----------------------------------------------------------


On Dec. 11, 2015, 2:36 a.m., Dmitriy Shirchenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41154/
> -----------------------------------------------------------
> 
> (Updated Dec. 11, 2015, 2:36 a.m.)
> 
> 
> Review request for Aurora, Maxim Khutornenko, Bill Farner, and Zameer Manji.
> 
> 
> Bugs: AURORA-1551
>     https://issues.apache.org/jira/browse/AURORA-1551
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Adding support for non-HTTP health checks.
> 
> 
> Diffs
> -----
> 
>   3rdparty/python/requirements.txt cfef18ee66b0f92d83c53dacb6d9376fc2e50445 
>   docs/configuration-reference.md 364292998bebb233d300fe59c9ea42b216deee81 
>   src/main/python/apache/aurora/client/config.py 2fc12559016d406c347adb416a5166cca31c961e

>   src/main/python/apache/aurora/common/BUILD 5fce3d0d29d2a38c6563b4d9be963532e595ee19

>   src/main/python/apache/aurora/common/health_check/__init__.py PRE-CREATION 
>   src/main/python/apache/aurora/common/health_check/shell.py PRE-CREATION 
>   src/main/python/apache/aurora/common/http_signaler.py a3193f3259276ec23d37f45839afe3c387cff6b1

>   src/main/python/apache/aurora/config/schema/base.py 398f737bed9ef02ce4a5636896d6587bce26501e

>   src/main/python/apache/aurora/executor/common/health_checker.py 03fdf0afef120c365c6ffad09e152780eed7e351

>   src/main/python/apache/aurora/executor/http_lifecycle.py 6d578cceb56375425ccac1cbfbbcd0add60f20e9

>   src/test/python/apache/aurora/client/BUILD 84c5c845d9b1c8078ae8b47242d0f2ffc00ef6dc

>   src/test/python/apache/aurora/client/test_config.py b1a3c1865819899ef19173be0f861783a2631d0a

>   src/test/python/apache/aurora/common/BUILD 2556c32842b3cf7040cb3c41172a0d9c365cb649

>   src/test/python/apache/aurora/common/health_check/BUILD PRE-CREATION 
>   src/test/python/apache/aurora/common/health_check/__init__.py PRE-CREATION 
>   src/test/python/apache/aurora/common/health_check/test_shell.py PRE-CREATION 
>   src/test/python/apache/aurora/common/test_http_signaler.py f68c71a6765f7f0b93c8c50662515b5742344f35

>   src/test/python/apache/aurora/executor/common/test_health_checker.py 27c71711d52f757ed1552db4accda671a6bdafdd

> 
> Diff: https://reviews.apache.org/r/41154/diff/
> 
> 
> Testing
> -------
> 
> Added unit tests. 
> Ran e2e test.
> Tested expected behavior on virtual Mesos cluster.
> 
> 
> Thanks,
> 
> Dmitriy Shirchenko
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message