openwhisk-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chetan Mehrotra <chetan.mehro...@gmail.com>
Subject Re: Proposal on a future architecture of OpenWhisk
Date Fri, 20 Jul 2018 12:13:06 GMT
Hi Markus,

Some of the points below are more of hunch and thinking out loud and
hence may not be fully objective or valid !.

> The buffer I described above is a per action "invoke me once resources are available"
buffer

> 1. Do we need to persist an in-memory queue that waits for resources to be created by
the ContainerManager?
> 2. Do we need a shared queue between the Controllers to enable work-stealing in cases
where multiple Controllers wait for resources?

As activations may be bulky (1 MB max) it may not be possible to keep
them in memory even if there is small incoming rate depending on how
fast they get consumed. Currently usage of Kafka takes the pressure of
Controller and helps in keeping them stable. So I suspect we may need
to make use of buffer more often to keep pressure off Controllers,
specially for heterogeneous loads and when system is making full use
of cluster capacity.

> That could potentially open up the possibility to use a technology more geared towards
Pub/Sub, where subscribers per action are more cheap to implement than on Kafka?

That would certainly be one option to consider. System like RabbitMQ
can support lots of queues. They may pose a problem on how to
efficiently consumer resources from all such queues.

Another option I was thinking was to have a design similar to current
involving controller and invoker but having a single queue shared
between controller and invoker and using action as the partition key
and multiple invokers forming a consumer group. This enables using
Kafka inbuilt support for horizontal scaling by adding new invoker and
having Kafka assign partitions to it.

Such a design would have following key aspects

A - Supporting Queue Per Action (sort of!)
--------------------------------------------
One way to do that is have topic per action which does not work well
with Kafka. Instead of that we can have each Invoker maintain a
dedicated ContainerPool per action which independently reads message
from the partition hosting its action. While doing that it would
filter out activations for other actions which happen to land on same
partition. An Invoker would spin of this independent container pool
after observing the rate of activation per action crosses certain
threshold. We can leverage this per action pool for better autoscaling
where it request more containers from the container orchestrator (k8s,
mesos) based on rate of activations and lag. Here system can learn
itself from usage pattern on how to allocate resources.

This may lead to multiple streaming/replay of same message from Kafka
but given the high rate with which messages can be fetched from single
partition it may work out fine. This would also require some custom
offset management where primary offset belongs to consumer bound to
Invoker but also records the offset per action

B - Hot Partition Handling
----------------------------------

A known drawback of using non random partitioning is hot partitions
whereby some Invoker may not get much work but some invoker may get
lots of work. This would managed by 2 things

1. High Container Density per Invoker - If we can reduce the Invoker
state it should be possible to have a single Invoker handle lot more
container concurrently. So even a hot Invoker would still be able to
make use of container distributed across multiple hosts and thus make
optimal use of cluster resources

2. Reassign Activations - If a action specific container pool sees
that its not able to keep up with rate of incoming activations of the
action it can then pull it out and add it back to same queue but then
assign to a less loaded invoker's partition directly.

Thoughts above digress from current proposal being discussed and may
not be presented in a very coherent way. However want to dump them
down to see if they make any sense or not. If there is some potential
then I can try to draft a more detailed proposal on wiki. Key motive
here is to get a design where we have independent container pool
pulling activations independently and auto scaling themselves based on
allowed limits and get us closer to a consumer per action topic kind
of design in a dynamic way!

Chetan Mehrotra

On Thu, Jul 19, 2018 at 6:06 PM Markus Thoemmes
<markus.thoemmes@de.ibm.com> wrote:
>
> Hi Chetan,
>
> >Currently one aspect which is not clear is does Controller has access
> >to
> >
> >1. Pool of prewarm containers - Container of base image where /init
> >is
> >yet not done. So these containers can then be initialized within
> >Controller
> >2. OR Pool of warm container bound to specific user+action. These
> >containers would possibly have been initialized by ContainerManager
> >and then it allocates them to controller.
>
> The latter case is what I had in mind. The controller only knows containers that are
already ready to call /run on.
>
> Pre-Warm containers are an implementation detail to the Controller. The ContainerManager
can keep them around to be able to answer demand for specific resources more quickly, but
the Controller doesn't care. It only knows warm containers.
>
> >Can you elaborate this bit more i.e. how scale up logic would work
> >and
> >is asynchronous?
> >
> >I think above aspect (type of pool) would have bearing on scale up
> >logic. If an action was not in use so far then when first request
> >comes (i.e. 0-1 scale up case) would Controller ask ContainerManager
> >for specific action container and then wait for its setup and then
> >execute it. OR if it has a generic pool then it takes one and
> >initializes it and use it. And if its not done synchronously then
> >would such an action be put to overflow queue.
>
> In this specific example, the Controller will request a container from the ContainerManager
and buffer the request until it finally has capacity to execute it. All subsequent requests
will be put on the same buffer and a Container will be requested for each of them.
>
> Whether we put this buffer in an overflow queue (aka persist it) remains to be decided.
If we keep it in memory, we have roughly the same guarantees as today. As Rodric mentioned
though, we can improve certain failure scenarios (like waiting for a container in this case)
by making this buffer more persistent. I'm not mentioning Kafka here for a reason, because
in this case any persistent buffer is just fine.
>
> Also note that this is not necessarily the case of the overflow queue. The overflow queue
is used for arbitrary requests once the ContainerManager cannot create more resources and
thus requests need to wait.
>
> The buffer I described above is a per action "invoke me once resources are available"
buffer, that could potentially be designed to be per Controller to not have the challenge
of scaling it out. That of course has its downsides in itself, for instance: A buffer that
spans all controllers would enable work-stealing between controllers with missing capacity
and could mitigate some of load-imbalances that Dominic mentioned. We are entering then the
same area that his proposal enters: The need of a queue per action.
>
> Conclusion is, we have 2 perspectives to look at this:
>
> 1. Do we need to persist an in-memory queue that waits for resources to be created by
the ContainerManager?
> 2. Do we need a shared queue between the Controllers to enable work-stealing in cases
where multiple Controllers wait for resources?
>
> An important thing to note here: Since all of this is no longer happening on the critical
path (stuff gets put on the queue only if it needs to wait for resources anyway), we can afford
a solution that isn't as perfomant as Kafka might be. That could potentially open up the possibility
to use a technology more geared towards Pub/Sub, where subscribers per action are more cheap
to implement than on Kafka?
>
> Does that make sense? Hope that helps :). Thanks for the questions!
>
> Cheers,
> Markus
>

Mime
View raw message