openwhisk-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Thömmes <markusthoem...@apache.org>
Subject Re: Action health checks
Date Wed, 30 Oct 2019 14:03:03 GMT
Increasing latency would be my biggest concern here as well. With a health
ping, we can't even be sure that a container is still healthy for the "real
request". To guarantee that, I'd still propose to have a look at the
possible failure modes and implement a retry mechanism on them. If you get
a "connection refused" error, I'm fairly certain that it can be retried
without harm. In fact, any error where we can guarantee that we haven't
actually reached the container can be safely retried in the described way.

Pre-warmed containers indeed are somewhat of a different story. A health
ping as mentioned here would for sure help there, be it just a TCP probe or
even a full-fledged /health call. I'd be fine with either way in this case
as it doesn't affect the critical path.

Am Di., 29. Okt. 2019 um 18:00 Uhr schrieb Tyson Norris
<tnorris@adobe.com.invalid>:

> By "critical path" you mean the path during action invocation?
> The current PR only introduces latency on that path for the case of a
> Paused container changing to Running state (once per transition from Paused
> -> Running).
> In case it isn't clear, this change does not affect any retry (or lack of
> retry) behavior.
>
> Thanks
> Tyson
>
> On 10/29/19, 9:38 AM, "Rodric Rabbah" <rodric@gmail.com> wrote:
>
>     as a longer term point to consider, i think the current model of "best
>     effort at most once" was the wrong design point - if we embraced
> failure
>     and just retried (at least once), then failure at this level would
> lead to
>     retries which is reasonable.
>
>     if we added a third health route or introduced a health check, would we
>     increase the critical path?
>
>     -r
>
>     On Tue, Oct 29, 2019 at 12:29 PM David P Grove <groved@us.ibm.com>
> wrote:
>
>     > Tyson Norris <tnorris@adobe.com.INVALID> wrote on 10/28/2019
> 11:17:50 AM:
>     > > I'm curious to know what other
>     > > folks think about "generic active probing from invoker" vs "docker/
>     > > mesos/k8s specific integrations for reacting to container
> failures"?
>     > >
>     >
>     > From a pure maintenance and testing perspective I think a single
> common
>     > mechanism would be best if we can do it with acceptable runtime
> overhead.
>     >
>     > --dave
>     >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message