Return-Path: X-Original-To: apmail-aurora-reviews-archive@minotaur.apache.org Delivered-To: apmail-aurora-reviews-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BE17C18D9F for ; Fri, 11 Dec 2015 04:27:20 +0000 (UTC) Received: (qmail 79347 invoked by uid 500); 11 Dec 2015 04:27:20 -0000 Delivered-To: apmail-aurora-reviews-archive@aurora.apache.org Received: (qmail 79297 invoked by uid 500); 11 Dec 2015 04:27:20 -0000 Mailing-List: contact reviews-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: reviews@aurora.apache.org Delivered-To: mailing list reviews@aurora.apache.org Received: (qmail 79269 invoked by uid 99); 11 Dec 2015 04:27:20 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Dec 2015 04:27:20 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id C37D82932B3; Fri, 11 Dec 2015 04:27:19 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============7293596724391691450==" MIME-Version: 1.0 Subject: Re: Review Request 41154: Add support for performing health checks with a shell command. From: "Joshua Cohen" To: "Bill Farner" , "Maxim Khutornenko" , "Zameer Manji" Cc: "Joshua Cohen" , "Dmitriy Shirchenko" , "Aurora" Date: Fri, 11 Dec 2015 04:27:19 -0000 Message-ID: <20151211042719.1618.11441@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: "Joshua Cohen" X-ReviewGroup: Aurora X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/41154/ X-Sender: "Joshua Cohen" References: <20151211012841.1619.51689@reviews.apache.org> In-Reply-To: <20151211012841.1619.51689@reviews.apache.org> Reply-To: "Joshua Cohen" X-ReviewRequest-Repository: aurora --===============7293596724391691450== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit > On Dec. 11, 2015, 1:28 a.m., Joshua Cohen wrote: > > src/test/python/apache/aurora/common/BUILD, line 26 > > > > > > I don't think we need this dependency here. The tests under health_check will be picked up when we run a recrusive target regardless. > > Dmitriy Shirchenko wrote: > I think it's needed. > > Here are test that run w/the line removed for > ./pants test.pytest --no-fast src/test/python/apache/aurora/common:all > : > ``` > src.test.python.apache.aurora.common.test_aurora_job_key ..... SUCCESS > src.test.python.apache.aurora.common.test_cluster ..... SUCCESS > src.test.python.apache.aurora.common.test_cluster_option ..... SUCCESS > src.test.python.apache.aurora.common.test_clusters ..... SUCCESS > src.test.python.apache.aurora.common.test_pex_version ..... SUCCESS > src.test.python.apache.aurora.common.test_shellify ..... SUCCESS > src.test.python.apache.aurora.common.test_transport ..... SUCCESS > ``` > > Maybe you can suggest something else? I'm saying that running tests from common/health_check is not expected behavior when running common:all. If you wanted to run tests from common and all of its descendents you'd run ./pants test src/test/python/apache/aurora/common:: > On Dec. 11, 2015, 1:28 a.m., Joshua Cohen wrote: > > src/main/python/apache/aurora/client/config.py, lines 98-110 > > > > > > I feel like this validation, while absolutely important on the client, should also be verified elsewhere... I don't think the Scheduler does anything other than pass this through to the executor, so probably the executor should validate it when starting the task? My concern is someone who does not use the client, but uses the API directly to create tasks would be able to send invalid config and either have undefined health check behavior, no health check, or just have a mysterious task failure. > > > > If we performed explicit validation in the executor when creating the health checker we at least could exit with an explicit reason that makes it clear that the health check config was invalid. > > Dmitriy Shirchenko wrote: > I completely agree, and those are valid concerns. I do feel that since AFAIK all of the validators suffer from this and it would be cleaner (and possibly safer) to add this feature in a different diff not to bloat this one. I think the main difference here is that today, it's impossible to construct an invalid HealthCheckConfig, the task would fail immediately on trying to deserialize the executor config if, say, the value for endpoint were not a String. However, after this change it would be possible to create a ShellHealthChecker with no command set, which would fail after the initial interval. Of course, I see your point... in practice this is not much different from configuring an http checker that is pointed to a non-existent endpoint... the end result would be the health check failing with an [internal health check error](https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/executor/common/health_checker.py#L96-L99). I'd be ok with filing a ticket and adding a TODO in the executor to fail fast. > On Dec. 11, 2015, 1:28 a.m., Joshua Cohen wrote: > > src/main/python/apache/aurora/executor/common/health_checker.py, line 248 > > > > > > This behavior is slightly changed from the current behavior, where if the user did not opt in to health checks (by not binding a `health` port), we would return `None`. > > > > Now if they did not bind a `health` port, but type is `shell` with no command we'll return an invalidly configured HealthChecker (It will have a ShellHealthCheck with no cmd set which will eventually cause the executor to fail when health check is run). This ties in to the need to validate in the executor that I mentioned above. > > > > Beyond that though, I've found there's been value in being able to configure a health check, but then, e.g. not have the devel version of your task bind a `health` port, opting out of health checking for those tasks. I'm not sure if there's a great way to replicate that behavior for shell health checks (e.g. add an `enabled` field to `HealthCheckConfig`, seems a bit gross) and if it's a useful enough "feature" to warrant replicating, but I figured I'd bring it up for discussion regardless. > > Dmitriy Shirchenko wrote: > Default behavior when user doesn't specify a health check is the same ie None is returned as default type is 'http'. If type is changed to 'shell' then client would error. Are you suggesting that we don't error in that case? What I was getting at above (beyond the need to validate, which is really captured in my previous comment) is with the http checker the user has a way to configure a health checker and still opt out of health checking by not binding a health port (e.g. for devel instances). That is not possible with a shell health checker from what I can see. I don't know that we want to bend over backwards to codify that functionality, but I thought it was worth discussing. - Joshua ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/41154/#review109901 ----------------------------------------------------------- On Dec. 11, 2015, 2:36 a.m., Dmitriy Shirchenko wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/41154/ > ----------------------------------------------------------- > > (Updated Dec. 11, 2015, 2:36 a.m.) > > > Review request for Aurora, Maxim Khutornenko, Bill Farner, and Zameer Manji. > > > Bugs: AURORA-1551 > https://issues.apache.org/jira/browse/AURORA-1551 > > > Repository: aurora > > > Description > ------- > > Adding support for non-HTTP health checks. > > > Diffs > ----- > > 3rdparty/python/requirements.txt cfef18ee66b0f92d83c53dacb6d9376fc2e50445 > docs/configuration-reference.md 364292998bebb233d300fe59c9ea42b216deee81 > src/main/python/apache/aurora/client/config.py 2fc12559016d406c347adb416a5166cca31c961e > src/main/python/apache/aurora/common/BUILD 5fce3d0d29d2a38c6563b4d9be963532e595ee19 > src/main/python/apache/aurora/common/health_check/__init__.py PRE-CREATION > src/main/python/apache/aurora/common/health_check/shell.py PRE-CREATION > src/main/python/apache/aurora/common/http_signaler.py a3193f3259276ec23d37f45839afe3c387cff6b1 > src/main/python/apache/aurora/config/schema/base.py 398f737bed9ef02ce4a5636896d6587bce26501e > src/main/python/apache/aurora/executor/common/health_checker.py 03fdf0afef120c365c6ffad09e152780eed7e351 > src/main/python/apache/aurora/executor/http_lifecycle.py 6d578cceb56375425ccac1cbfbbcd0add60f20e9 > src/test/python/apache/aurora/client/BUILD 84c5c845d9b1c8078ae8b47242d0f2ffc00ef6dc > src/test/python/apache/aurora/client/test_config.py b1a3c1865819899ef19173be0f861783a2631d0a > src/test/python/apache/aurora/common/BUILD 2556c32842b3cf7040cb3c41172a0d9c365cb649 > src/test/python/apache/aurora/common/health_check/BUILD PRE-CREATION > src/test/python/apache/aurora/common/health_check/__init__.py PRE-CREATION > src/test/python/apache/aurora/common/health_check/test_shell.py PRE-CREATION > src/test/python/apache/aurora/common/test_http_signaler.py f68c71a6765f7f0b93c8c50662515b5742344f35 > src/test/python/apache/aurora/executor/common/test_health_checker.py 27c71711d52f757ed1552db4accda671a6bdafdd > > Diff: https://reviews.apache.org/r/41154/diff/ > > > Testing > ------- > > Added unit tests. > Ran e2e test. > Tested expected behavior on virtual Mesos cluster. > > > Thanks, > > Dmitriy Shirchenko > > --===============7293596724391691450==--