openwhisk-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David P Grove" <>
Subject Re: Invoker HA on Mesos
Date Tue, 27 Mar 2018 21:28:13 GMT
Tyson Norris <> wrote on 03/27/2018 04:33:48 PM:
> We’ve been discussing how to handle mesos framework HA in the
> Invoker, and I created a proposal on the wiki to discuss.
> In general, the idea is to allow a single cluster-wide/single
> ContainerPool to operate, while providing a reasonable failover
> behavior in case of its unexpected death.
> To accomplish this, the proposal is to allow parts of the
> ContainerPool (freePool and prewarmPool) to be replicated to other
> (passive) invoker instances, and to allow the replicated container
> meta data to be used by ContainerFactories to resurrect containers
> for use in case a failure occurs.
> This does a couple things, like removing the notion of resource
> scheduling from the Controller (since there is only ever 1 invoker),
> and allows the ContainerPool to operate with a holistic view of the
> cluster, useful for whole-cluster ContainerFactory impls like
> MesosContainerFactory.
> I’m curious if the kubernetes folks will also find this useful?

Hi Tyson,

Thanks for writing this up!

A couple of thoughts.
	1. Using Akka Distributed Data to actively track the set of
containers to support failure recovery seems like a lot of overhead.  For
Kubernetes, we are labeling all the action containers with their owning
invoker using Kubernetes labels.  So, when an invoker crashes and gets
replaced, one could recover all of its prewarmed & freepool containers with
a simple query against the Kubernetes API server.  No need to track the set
actively; Kube is already doing that via the labels.  Anything similar to
Kube labels in Mesos?

	2.  We've been exploring running with a smaller number of invokers
than worker nodes and cluster-wide scheduling using the
KubernetesContainerFactory + invokerAgent.  However, I don't believe at
production scale a single Invoker for an entire cluster is going to be
viable.  Especially with the current architecture where the action
parameters get streamed through the invoker and the action results get
streamed back through the invoker.  I believe that is going to bottleneck
how many containers a single Invoker can manage.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message