openwhisk-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Thoemmes" <markus.thoem...@de.ibm.com>
Subject Re: New scheduling algorithm proposal.
Date Tue, 08 May 2018 08:25:51 GMT
Hey Dominic,

Thank you for the very detailed writeup. Since there is a lot in here, please allow me to
rephrase some of your proposals to see if I understood correctly. I'll go through point-by-point
to try to keep it close to your proposal.

**Note:** This is a result of an extensive discussion of Christian Bickel (@cbickel) and myself
on this proposal. I used "I" throughout the writeup for easier readability, but all of it
can be read as "we".

# Issues:

## Interventions of actions.

That's a valid concern when using today's loadbalancer. This is noisy-neighbor behavior that
can happen today under the circumstances you describe.

## Does not wait for previous run.

True as well today. The algorithms used until today value correctness over performance. You're
right, that you could track the expected queue occupation and schedule accordingly. That does
have its own risks though (what if your action has very spiky latency behavior?).

I'd generally propose to break this out into a seperate discussion. It doesn't really correlate
to the other points, WDYT?

## Invoker coordinates all requests.

I tend to disagree with the "cannot take advantage of parallel processing" bit. Everything
in the invoker is parallelized after updating its central state (which should take a **very**
short amount of time relative to actual action runtime). It is not really optimized to scale
to a lot of containers *yet*.

## Not able to accurately control concurrent invocation.

Well, the limits are "concurrent actions in the system". You should be able to get 5 activations
on the queue with today's mechanism. You should get as many containers as needed to handle
your load. For very short-running actions, you might not need N containers to handle N messages
in the queue.

## TPS is not deterministic.

I'm wondering: Have TPS been deterministic for just one user? I'd argue that this is a valid
metric on its own kind. I agree that these numbers can drop significantly under heterogeneous
load.

# Proposal:

I'll try to rephrase and add some bits of abstraction here and there to see if I understood
this correctly:

The controller should schedule based on individual actions. It should not send those to an
arbitrary invoker but rather to something that identifies those actions themselves (a kafka
topic in your example). I'll call this *PerActionContainerPool*. Those calls from the controller
will be handled by each *ContainerProxy* directly rather than being threaded through another
"centralized" component (the invoker). The *ContainerProxy* is responsible for handling the
"aftermath": Writing activation records, collecting logs etc (like today).

Iff the controller thinks that the existing containers cannot sustain the load (i.e. if all
containers are currently in use), it advises a *ContainerCreationSystem* (all invokers combined
in your case) to create a new container. This container will be added to the *PerActionContainerPool*.

The invoker in your proposal has no scheduling logic at all (which is sound with the issues
lined out above) other than container creation itself.

# Conclusion:

I like the proposal in the abstract way I've tried to phrase above. It indeed amplifies warm-container
usage and in general should be superior to the more statistical approach of today's loadbalancer.

I think we should discuss this proposal in an abstract, non-technology-bound way. I do think
that having so many kafka topics including all the rebalancing needed can become an issue,
especially because the sheer number of kafka topics is unbounded. I also think that the consumer
lag is subject to eventual consistency and depending on how eventual that is it can turn into
queueing in your system, even though that wouldn't be necessary from a capacity perspective.

I don't want to ditch the proposal because of those concerns though!

As I said: The proposal itself makes a lot of sense and I like it a lot! Let's not trap ourselves
in the technology used today though. You're proposing a major restructuring so we might as
well think more green-fieldy. WDYT?

Cheers,
Christian and Markus


Mime
View raw message