openwhisk-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tyson Norris <>
Subject Re: Invoker HA on Mesos
Date Tue, 27 Mar 2018 22:25:59 GMT

> On Mar 27, 2018, at 2:28 PM, David P Grove <> wrote:
> Tyson Norris <> wrote on 03/27/2018 04:33:48 PM:
>> We’ve been discussing how to handle mesos framework HA in the
>> Invoker, and I created a proposal on the wiki to discuss.
> +Invoker+for+HA+on+Mesos
>> In general, the idea is to allow a single cluster-wide/single
>> ContainerPool to operate, while providing a reasonable failover
>> behavior in case of its unexpected death.
>> To accomplish this, the proposal is to allow parts of the
>> ContainerPool (freePool and prewarmPool) to be replicated to other
>> (passive) invoker instances, and to allow the replicated container
>> meta data to be used by ContainerFactories to resurrect containers
>> for use in case a failure occurs.
>> This does a couple things, like removing the notion of resource
>> scheduling from the Controller (since there is only ever 1 invoker),
>> and allows the ContainerPool to operate with a holistic view of the
>> cluster, useful for whole-cluster ContainerFactory impls like
>> MesosContainerFactory.
>> I’m curious if the kubernetes folks will also find this useful?
> Hi Tyson,
> Thanks for writing this up!
> A couple of thoughts.
> 	1. Using Akka Distributed Data to actively track the set of
> containers to support failure recovery seems like a lot of overhead.  For
> Kubernetes, we are labeling all the action containers with their owning
> invoker using Kubernetes labels.  So, when an invoker crashes and gets
> replaced, one could recover all of its prewarmed & freepool containers with
> a simple query against the Kubernetes API server.  No need to track the set
> actively; Kube is already doing that via the labels.  Anything similar to
> Kube labels in Mesos?

Do you have an example of the labels working? I guess the labels are changed over time through
the lifecycle of the container? 
Nothing I know of similar in mesos, although it could be done in the MesosContainerFactory
for similar purposes. 

I think it is partly a question of where to place container re-association logic (in the ContainerPool?
ContainerFactory? Both?), and partly how to implement ContainerPool behavior. 
Currently MesosContainerFactory operates as an embedded mesos client that must behave as a
singleton with failover, so ContainerPool needs to do the same 

As for the overhead, I’m not sure it is, or isn’t. Currently it is using WriteLocal writeConsistency
to publish updates to the pools, and only on particular combinations (prewarm+PrewarmedData,
and free+WarmedData)   

I’ll get the PR up shortly. 

> 	2.  We've been exploring running with a smaller number of invokers
> than worker nodes and cluster-wide scheduling using the
> KubernetesContainerFactory + invokerAgent.  However, I don't believe at
> production scale a single Invoker for an entire cluster is going to be
> viable.  Especially with the current architecture where the action
> parameters get streamed through the invoker and the action results get
> streamed back through the invoker.  I believe that is going to bottleneck
> how many containers a single Invoker can manage.

Yeah I’m wondering the same thing. For now, operating only one (or a few) controllers has
the same issue right? 

For mesos, we can operate "multiple invoker clusters”, following the same approach outlined,
but instead of using a single topic invoker0, we have multiple topics invoker1..invokern,
(similar to today), but where each topic is consumed by multiple invokers (in active/passive

Separately, related to performance:
- we still plan to allow concurrency so things that are required to be fast (blocking activations,
and ones that signal they tolerate it) should leverage this, and things that can tolerate
additional latency should not
- if the ContainerPool operated cleanup on a more GC-like semantic, external clients (other
invokers, or the controller) would be able to use existing running containers (at least where
concurrency is tolerated). When all clients of a container have completed (plus some time),
it could be garbage collected by the ContainerPool (similar to how freePool containers currently
linger). This could unburden some activation processing from the current invoker workflow.

Of course, your mileage may vary: I’m not sure how well any of that works in a case where
you don’t support concurrent activations, or for cases where the number of actions/containers
exceeds the number of clients (i.e. where everything is only ever used once).


> --dave

View raw message