openwhisk-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tyson Norris <>
Subject Re: Invoker HA on Mesos
Date Wed, 28 Mar 2018 02:54:00 GMT
PR is opened here:

On Mar 27, 2018, at 3:25 PM, Tyson Norris <<>>

On Mar 27, 2018, at 2:28 PM, David P Grove <<>>

Tyson Norris <<>> wrote
on 03/27/2018 04:33:48 PM:

We’ve been discussing how to handle mesos framework HA in the
Invoker, and I created a proposal on the wiki to discuss.

In general, the idea is to allow a single cluster-wide/single
ContainerPool to operate, while providing a reasonable failover
behavior in case of its unexpected death.

To accomplish this, the proposal is to allow parts of the
ContainerPool (freePool and prewarmPool) to be replicated to other
(passive) invoker instances, and to allow the replicated container
meta data to be used by ContainerFactories to resurrect containers
for use in case a failure occurs.

This does a couple things, like removing the notion of resource
scheduling from the Controller (since there is only ever 1 invoker),
and allows the ContainerPool to operate with a holistic view of the
cluster, useful for whole-cluster ContainerFactory impls like

I’m curious if the kubernetes folks will also find this useful?

Hi Tyson,

Thanks for writing this up!

A couple of thoughts.
1. Using Akka Distributed Data to actively track the set of
containers to support failure recovery seems like a lot of overhead.  For
Kubernetes, we are labeling all the action containers with their owning
invoker using Kubernetes labels.  So, when an invoker crashes and gets
replaced, one could recover all of its prewarmed & freepool containers with
a simple query against the Kubernetes API server.  No need to track the set
actively; Kube is already doing that via the labels.  Anything similar to
Kube labels in Mesos?

Do you have an example of the labels working? I guess the labels are changed over time through
the lifecycle of the container?
Nothing I know of similar in mesos, although it could be done in the MesosContainerFactory
for similar purposes.

I think it is partly a question of where to place container re-association logic (in the ContainerPool?
ContainerFactory? Both?), and partly how to implement ContainerPool behavior.
Currently MesosContainerFactory operates as an embedded mesos client that must behave as a
singleton with failover, so ContainerPool needs to do the same

As for the overhead, I’m not sure it is, or isn’t. Currently it is using WriteLocal writeConsistency
to publish updates to the pools, and only on particular combinations (prewarm+PrewarmedData,
and free+WarmedData)

I’ll get the PR up shortly.

2.  We've been exploring running with a smaller number of invokers
than worker nodes and cluster-wide scheduling using the
KubernetesContainerFactory + invokerAgent.  However, I don't believe at
production scale a single Invoker for an entire cluster is going to be
viable.  Especially with the current architecture where the action
parameters get streamed through the invoker and the action results get
streamed back through the invoker.  I believe that is going to bottleneck
how many containers a single Invoker can manage.

Yeah I’m wondering the same thing. For now, operating only one (or a few) controllers has
the same issue right?

For mesos, we can operate "multiple invoker clusters”, following the same approach outlined,
but instead of using a single topic invoker0, we have multiple topics invoker1..invokern,
(similar to today), but where each topic is consumed by multiple invokers (in active/passive

Separately, related to performance:
- we still plan to allow concurrency so things that are required to be fast (blocking activations,
and ones that signal they tolerate it) should leverage this, and things that can tolerate
additional latency should not
- if the ContainerPool operated cleanup on a more GC-like semantic, external clients (other
invokers, or the controller) would be able to use existing running containers (at least where
concurrency is tolerated). When all clients of a container have completed (plus some time),
it could be garbage collected by the ContainerPool (similar to how freePool containers currently
linger). This could unburden some activation processing from the current invoker workflow.

Of course, your mileage may vary: I’m not sure how well any of that works in a case where
you don’t support concurrent activations, or for cases where the number of actions/containers
exceeds the number of clients (i.e. where everything is only ever used once).



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message