openwhisk-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tyson Norris <tnor...@adobe.com.INVALID>
Subject Re: Introducing sharding as an alternative for state sharing
Date Wed, 28 Feb 2018 20:57:27 GMT
One related question on this is: how is throttling handled? 

Looking at the code it looks like each controller instance will maintain its own stats (totalActivations,
activationsPerNamespace) independently. 
Am I missing it?

Thanks
Tyson

> On Feb 1, 2018, at 7:17 AM, Stephen Fink <fink.stephen@gmail.com> wrote:
> 
> Perfect.  Thanks.
> 
> SJF
> 
>> On Feb 1, 2018, at 10:12 AM, Markus Thoemmes <markus.thoemmes@de.ibm.com> wrote:
>> 
>> Hi Steve,
>> 
>> fair point and that's exactly what we're trying to exploit here: While the loadbalancer
only sees 8 slots for example on that invoker, the invoker's logic remains unchanged. It always
sees the full-picture of its state and doesn't care where the load is coming from. So OpenWhisk
would (given the load is slow enough) only provision one container for this action. Both loadbalancers
would operate using the same hashes and and stepSizes and thus choose the very same invoker
(given no other load in the system).
>> 
>> Cheers.	 	         
>> 
>> -----Stephen Fink <fink.stephen@gmail.com> wrote: -----
>> To: dev@openwhisk.apache.org
>> From: Stephen Fink <fink.stephen@gmail.com>
>> Date: 02/01/2018 03:57PM
>> Subject: Re: Introducing sharding as an alternative for state sharing
>> 
>> Hi Markus,
>> 
>> Suppose I have a client which fires one action repeatedly in a loop.   So the server
handles one invocation of this action at a time.
>> 
>> Under the current system,  OW will provision one container for this action, and it
will get reused for every invocation.
>> 
>> Under horizontal sharding with factor N,  it seems OW will provision N containers
ā€” which is sub-optimal for several reasons (cold starts, memory footprint, consumption of
stem cells).
>> 
>> Do you agree?
>> 
>> If so ā€” I wonder if thereā€™s a hybrid strategy ā€” suppose the load balancer operates
via horizontal sharding, but the invoke-side logic remains unchanged.   Then an invoker can
continue to use all slots available to maximize container reuse, while the load balancer avoids
shared state.
>> 
>> Make sense?
>> 
>> SJF
>> 
>> 
>>> On Feb 1, 2018, at 9:25 AM, Markus Thoemmes <markus.thoemmes@de.ibm.com>
wrote:
>>> 
>>> Hi folks,
>>> 
>>> we (Christian Bickel and I) just opened a pull-request for comments on introducing
a notion of sharding instead of sharing the state between our controllers (loadbalancers).
It also addresses a few deficiencies of the old loadbalancer to remove any kinds of bottlenecks
there and make it as fast as possible.
>>> 
>>> Commit message for posterity:
>>> 
>>> The current ContainerPoolBalancer suffers a couple of problems and bottlenecks:
>>> 
>>> 1. Inconsistent state: The data-structures keeping the state for that loadbalancer
are not thread-safely handled, meaning there can be queuing to some invokers even though there
is free capacity on other invokers.
>>> 2. Asynchronously shared state: Sharing the state is needed for a high-available
deployment of multiple controllers and for horizontal scale in those. Said state-sharing makes
point 1 even worse and isn't anywhere fast enough to be able to efficiently schedule quick
bursts.
>>> 3. Bottlenecks: Getting the state from the outside (like for the ActivationThrottle)
is a very costly operation (at least in the shared state case) and actually bottlenecks the
whole invocation path. Getting the current state of the invokers is a second bottleneck, where
one request is made to the corresponding actor for each invocation.
>>> This new implementation aims to solve the problems mentioned above as follows:
>>> 
>>> 1. All state is local: There is no shared state. Resources are managed through
horizontal sharding. Horizontal sharding means: The invokers' slots are evenly divided between
the loadbalancers in existence. If we deploy 2 loadbalancers and each invoker has 16 slots,
each of the loadbalancers will have access to 8 slots on each invoker.
>>> 2. Slots are given away atomically: When scheduling an activation, the slot is
immediately assigned to that activation (implemented through Semaphores). That means: Even
in concurrent schedules, there will not be an overload on an invoker as long as there is capacity
left on that invoker.
>>> 3. Asynchronous updates of slow data: Slowly changing data, like a change in
the invoker's state, is asynchronously handled and updated to a local version of the state.
Querying the state is as cheap as it can be.
>>> 
>>> A few words on the implementation details:
>>> 
>>> We chose to use horizontal sharding (evenly dividing the capacity of each invoker)
vs. vertical sharding (evenly dividing the invokers as a whole) for the sake of staging these
changes mainly. Once we divide vertically, we'll need a good loadbalancing strategy in front
of our controllers themselves, to keep unnecessary cold-starts to a minimum and maximize container
reuse. By dividing horizontally, we maintain the same reuse policies as today and can even
keep the same loadbalancing strategies intact. Horizontal sharding of course only scales so
far (maybe to 4 controllers, assuming 16 slots on each invoker) but it will give us time to
figure out good strategies for vertical sharding and learn along the way. For vertical sharding
to work, it will also be crucial to have the centralized overflow queue to be able to offload
work between shards through workstealing. All in all: Vertical sharding is a much bigger change
than horizontal sharding.
>>> 
>>> We tried to implement everything in a single actor first, but that seemed to
impose a bottleneck again. Note that this is very frequented code, it needs to be very tight.
That might not match the actor model too well.
>>> 
>>> Subsuming everything: This keeps all proposed changes intact (most notably Tyson's
parallel executions and overloading queue).
>>> 
>>> A note on the gains made by this: Our non-blocking invoke performance is now
quite close to the raw Kafka produce performance that we have in the system (not that it's
good in itself, but that's the next theoretical bottleneck). Before the changes, this was
roughly bottlenecked to 4k requests per second on a sufficiently powerful machine. Blocking
invoke performance was roughly doubled.
>>> 
>>> Any comments, thoughts?
>>> 
>>> Cheers,
>>> Markus
>>> 
>> 
>> 
> 

Mime
View raw message