openwhisk-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Thömmes <>
Subject Re: Proposal on a future architecture of OpenWhisk
Date Wed, 15 Aug 2018 07:45:44 GMT
Hi Dave,

thanks a lot for your input! Greatly appreciated.

Am Di., 14. Aug. 2018 um 23:15 Uhr schrieb David P Grove <

> "Markus Thömmes" <> wrote on 08/14/2018 10:06:49
> AM:
> >
> > I just published a revision on the initial proposal I made. I still owe a
> > lot of sequence diagrams for the container distribution, sorry for taking
> > so long on that, I'm working on it.
> >
> > I did include a clear seperation of concerns into the proposal, where
> > user-facing abstractions and the execution (loadbalacing, scaling) of
> > functions are loosely coupled. That enables us to exchange the execution
> > system while not changing anything in the Controllers at all (to an
> > extent). The interface to talk to the execution layer is HTTP.
> >
> Nice writeup!
> For me, the part of the design I'm wondering about is the separation of the
> ContainerManager and the ContainerRouter and having the ContainerManager by
> a cluster singleton. With Kubernetes blinders on, it seems more natural to
> me to fuse the ContainerManager into each of the ContainerRouter instances
> (since there is very little to the ContainerManager except (a) talking to
> Kubernetes and (b) keeping track of which Containers it has handed out to
> which ContainerRouters -- a task which is eliminated if we fuse them).

As you say below, the main concern is dealing with the edge-case I laid out.

> The main challenge is dealing with your "edge case" where the optimal
> number of containers to create to execute a function is less than the
> number of ContainerRouters.  I suspect this is actually an important case
> to handle well for large-scale deployments of OpenWhisk.  Having 20ish
> ContainerRouters on a large cluster seems plausible, and then we'd expect a
> long tail of functions where the optimal number of container instances is
> less than 20.

I agree, in large scale environments that might well be an important case.

> I wonder if we can partially mitigate this problem by doing some amount of
> smart routing in the Controller.  For example, the first level of routing
> could be based on the kind of the action (nodejs:6, python, etc).  That
> could then vector to per-runtime ContainerRouters which dynamically
> auto-scale based on load.  Since there doesn't have to be a fixed division
> of actual execution resources to each ContainerRouter this could work.  It
> also lets easily stemcells for multiple runtimes without worrying about
> wasting too many resources.

The premise I wanted to keep in my proposal is that you can route
essentially random between the routers. That's also why I use the overflow
queue as a work-stealing queue essentially to balance load between the
routers if the discrepancies get too high.

My general gut-feeling as to what can work here is: Keep state local as
long as you can (the individual ContainerRouters) to make the hot-path as
fast as possible. Fall back to work-stealing (slower, more constrained),
once things get out of bands.

> How do you want to deal with design alternatives?  Should I be adding to
> the wiki page?  Doing something else?

Good question. Feels like we can break out a "Routing" Work Group out of
this? Part of my proposal was to build this out collaboratively. Maybe we
can try to find consensus on some general points (direct HTTP connection to
containers should be part of it, we'll need an overflow queue) and once/if
we agree on the general broader picture, we can break out discussions on
individual aspects of it? Would that make sense?

> --dave

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message