openwhisk-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tyson Norris <tnor...@adobe.com.INVALID>
Subject Re: Invoker activation queueing proposal
Date Thu, 05 Oct 2017 17:13:07 GMT
OK, yes, was confused by “could be 2N *or* N+1” queues. 

RE: occupancy time in ready queue: yes should be super low; stay Kafka for now, agree thats
a separate discussion.

RE: overflow queue - I’m not sure the use in having it “per invoker”, so would plan
to make it shared (single overflow queue). i.e. If it's per invoker, the same scheduling problem
exists, except now you have a chance to reschedule when overflow queue is processed; but in
that case, just schedule it once, only at the time it is landing in the ready queue. This
makes it a) simpler (fewer topics) and b) arguably more fair(overflow items waiting longest
get priority once capacity is available on any invoker).  



> On Oct 4, 2017, at 11:41 PM, Rodric Rabbah <rodric@gmail.com> wrote:
> 
> The ready queue is per invoker - it has to be (and basically the current implementation;
I’m not convinced you need to change that from Kafka at first: what’s the expected occupancy
in the Kafka queue, per invoker, vs any other implementation you have in mind?). 
> 
> The overflow queue, could be per invoker or not. Today both queues are conflated into
a single topic per invoker. Hence, two queues. 
> 
> -r
> 
>> On Oct 4, 2017, at 10:49 PM, Tyson Norris <tnorris@adobe.com.INVALID> wrote:
>> 
>> So "dispatches immediately to an invoker" is currently implemented as a Kafka topic-per-invoker.
If we don't change that Kafka transport to something else, I'm not sure what you have in mind
to get down to "only 2 queues". In other words the "ready queue" seems problematic to dispatch
via a single queue to multiple invokers if based on the current Kafka impl.
>> 
>> 
>>> On Oct 4, 2017, at 6:29 PM, Rodric Rabbah <rodric@gmail.com> wrote:
>>> 
>>> What you’re describing is generalized to essentially two queues. The ready
queue which dispatches immediately to an invoker with capacity and an overflow queue. Whether
the latter is one per invoker or global may not matter much as long as it’s drained by the
load balancer and not committed otherwise to the invoker ready queue where it is difficult
to reassign to an available invoker later (ala work stealing). There are pathologies one has
to guard against and we should consider fairness policies as well. There’s good theory in
this space that I can imagine using to model the heuristics.
>>> 
>>> -r
>>> 
>>>> On Oct 4, 2017, at 8:39 PM, Tyson Norris <tnorris@adobe.com.INVALID>
wrote:
>>>> 
>>>> Not sure what you mean by two queues - you mean two queues per invoker? Or
total?
>>>> 
>>>>> On Oct 4, 2017, at 5:14 PM, Rodric Rabbah <rodric@gmail.com> wrote:
>>>>> 
>>>>> A two queue (topic) approach can mitigate the lack of random access from
a Kafka topic/queue once a request is committed (in today’s architecture). This could enable
work stealing in particular since the (overflow) queue can be drained upstream (ie load balancer)
and reassigned to free invokers. Balancing cold start that container locality would them be
a heuristic we can apply more judiciously as capacity is available (versus the current approach
which binds too early and prevents rebalancing). 
>>>>> 
>>>>> -r
>>>>> 
>>>>>> On Oct 4, 2017, at 7:45 PM, Tyson Norris <tnorris@adobe.com.INVALID>
wrote:
>>>>>> 
>>>>>> Hi -
>>>>>> I’ve been discussing a bit with a few about optimizing the queueing
that goes on ahead of invokers so that things behave more simply and predictable.
>>>>>> 
>>>>>> In short: Instead of scheduling activations to an invoker on receipt,
do the following:
>>>>>> - execute the activation "immediately" if capacity is available
>>>>>> - provide a single overflow topic for activations that cannot execute
“immediately"
>>>>>> - schedule from the overflow topic when capacity is available
>>>>>> 
>>>>>> (BTW “Immediately” means: still queued via existing invoker topics,
but ONLY gets queued there in the case that the invoker is not fully loaded, and therefore
should execute it “very soon")
>>>>>> 
>>>>>> Later: it would also be good to provide more container state data
from invoker to controller, to get better scheduling options - e.g. if some invokers can handle
running more containers than other invokers, that info can be used to avoid over/under-loading
the invokers (currently we assume each invoker can handle 16 activations, I think)
>>>>>> 
>>>>>> I put a wiki page proposal here: https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FOPENWHISK%2FInvoker%2BActivation%2BQueueing%2BChange&data=02%7C01%7C%7C3ff41b0527be4e13398c08d50b8614ee%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636427592860490625&sdata=afOAkeXRW6iArXa72IGd6gnhVyrOwXsDxnjfSAQqL18%3D&reserved=0
>>>>>> 
>>>>>> WDYT?
>>>>>> 
>>>>>> Thanks
>>>>>> Tyson

Mime
View raw message