openwhisk-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carlos Santana <csantan...@gmail.com>
Subject Re: Active acks from invoker to controller
Date Tue, 25 Sep 2018 12:39:34 GMT
With concurrency of 1 like we have by default, the action container is not
free up until the log collection is finished, so the controller should not
schedule a new activation to the invoker until then, regardless if it's
blocking or not blocking.

I like the idea of the invoker be more detailed specific about the progress
of the invocation lifecycle, to send an active-ack on invoker done, and a
second one for log collection done (if log limit is not 0 or invoker
configured to not pull logs).
An operator can configure invoker to not deal with logs, like in kube
delegate to kube pod log collection, or to runtime to log in certain format
for another system to collect the logs.

I think a active-ack regardless if it's blocking or none-blocking should be
sent back as soon the results are available to invoker, and the message to
include like you said (this invocation is partially done ie busy collecting
logs, expect another message with done)

Then on the other side the controller load balancer, or specialized load
balancer with an SPI can decide if the invoker slot is free or not. I would
say in the default load balancer the invoker slot would not be free until
the second message arrives indicating the log collection and anything else
going with the invocation is done.

The benefit for blocking invokes then it means the result it's at hand
while logs are being collected, and the client can get faster response of
the webaction, or next step on sequence can progress faster.

Maybe this already matches with your first option Christian?

I would not penalize blocking invokes (web actions, sequence) for waiting
for logs that doesn't need to progress as fastest at it can.

-- Carlos


On Tue, Sep 25, 2018 at 4:14 AM Christian Bickel <cbickel@apache.org> wrote:

> Hi,
>
> today, we execute the user-action in the invoker, send the active-ack
> back to the controller and collect logs afterwards.
> This has the following implication:
> - controller receives the active ack, so it thinks the slot on the
> invoker is free again.
> - BUT the invoker is still collecting logs, which means that the
> activation has to wait until log collection is finished.
> Especially when log-collection takes long (e.g. because of high CPU
> load on the invoker-machine), user-actions have to wait longer and
> longer over time.
>
> If this happens, you will read the following message in the invoker:
> `Rescheduling Run message, too many message in the pool, freePoolSize:
> 0 containers and 0 MB, busyPoolSize ...`
>
> But it definitely makes sense to send the active-ack (at least for
> blocking activations) to the controller as fast as possible, because
> the controller should answer the request as fast as possible.
>
> So my proposal is to differentiate between blocking and non-blocking
> activations. The invoker today already knows, if it is blocking or
> not.
> If the activation is non-blocking, we wait with the active-ack until
> log collection is finished.
> If the activation is blocking, we send an active-ack with a field,
> that logColleaction is not finished yet, like today and a second
> active-ack, after log-collection is finished.
>
> With this behaviour, the user gets its response as fast as possible on
> blocking activations and the loadbalancer waits with dispatching,
> until the slot is freed up.
>
> I also did a test to verify performance.
> For this test, I took a system with 100 invokers and space for 32
> 256MB actions on each invoker. (Two controllers, 1 Kafka)
> I used our gatling test `BlockingInvokeOneActionSimulation`. The
> action of the test writes one logline and returns the input paramters
> again.
> The test executed all activations blocking, which means, that two
> active-acks have been sent per activation.
> I used 2880 parallel connections, which should result in 90% system
> utilisation (blackbox-fraction is set to 0).
> As you can see, this scenario generates the most possible active-acks.
> To the result:
> The throughput per second is at 97% compared to the current master.
> The response times are also nearly the same.
> So there is nearly no regression in the worst case scenario.
> In addition, I looked for the log-message I mentioned above in the
> invoker. It has not been written in the test with my changes, but
> thousands of times on the master.
> For non-blocking requests I don't expect any regression, but the
> waiting-time on the invoker should be less.
>
> Another valid approach would be, to wait with the active-ack, until
> log-collection is finished (independent of blocking or non-blocking).
> If the action is executed blocking, we could say, that it's the users
> responsibility to not log too much or to set the loglimit to 0, to get
> fast responses.
>
> Does anyone have an opinion, which of the two approaches we should
> pursue. Or has anyone another idea?
>
> Greetings
> Christian
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message