aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erb, Stephan" <Stephan....@blue-yonder.com>
Subject Re: RFC HealthCheck
Date Sat, 21 Feb 2015 11:48:11 GMT
Hi Florian,

have you looked at what Mesos is already offering out of the box [1]? Maybe there is a way
to implement your features by relying on Mesos directly, instead of making the Aurora implementation
more flexible. 

As you've mentioned, the  lifecycle endpoints abort and quit seem to be quite orthogonal to
the health checking idea. I would be in favor of separating the different concepts. I even
thought about this yesterday, because in our environment we only want health checking but
now also have to pay a  price of 10secs additional latency when stopping jobs due the graceful
kill escalation.

[1] https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L141


Regards,
Stephan

________________________________________
From: Florian Pfeiffer <florian.pfeiffer@gutefrage.net>
Sent: Saturday, February 21, 2015 4:27 AM
To: dev@aurora.incubator.apache.org
Subject: RFC HealthCheck

Hi,

I would like to start working on the Healthchecker

1) Enable configuration of the portname to which run health checks on (this should also tackle
AURORA-321 )
This seems like a very small change consisting of adding a new variable named „port“ to
the HealthCheckConfig  in base.py with a default value of „health“ to be backwards compatible.
Any pitfalls? Any objections?

2) There’s at least one ticket in jira that’s about making the endpoints for the health
check configurable. I would like to have a health check that works on HTTP Status Codes, and
there might be other people that are fine with a health check that works on checking if it’s
possible to make a TCP connection

For my use case I would probably be fine, if I add a variable „method“ to the HealthCheckConfig,
with a  default value of „classic“ for the current behavior and s.th<http://s.th>.
like „statuscode“ for a check that’s very very similar to the current one in http_signaler.py
but instead of parsing the response checks the status code (with the downside that the endpoints
/health /abort /quitquitquit are still hardcoded)

Any ideas how this can be a little bit more generic, so that if we have 3-5 different types
of health checks we can have different arguments to each health check? (e.g. expected_response
for the current one, expected_code for the status code checker, and maybe s.th<http://s.th>.
like max_response_time for defining how fast traffic has to appear on a tcp connection check)


A side question: for me it seems like /health and (/abort & /quitquitquit) are not very
closely related. Does it make sense to have those 3 things grouped in the HealthCheck?


Best,
Florian



Mime
View raw message