openwhisk-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Thoemmes" <markus.thoem...@de.ibm.com>
Subject Connection reuse in the Invoker (HttpUtils)
Date Tue, 03 Jul 2018 13:47:33 GMT
Hi OpenWhiskers,

I (and potentially Tyson in https://github.com/apache/incubator-openwhisk/issues/3151) found
a pretty nasty bug which is very spurious, but hear me out:

Today, we employ a connection pool when talking to our user-containers. The assumption is
that connections are reused between requests, a performance improvement. If a container is
not used for some time (aka pause-grace) we pause it, effectively freezing the container.
The container (and its socket) will not answer to any requests coming from the outside for
the time of being paused.

When that container is to be used again, we resume it and then try to reuse one of the connections
I talked about earlier. These connections can go stale though. The server might close them
before further notice and my expectation is, that pausing/resuming adds another layer of uncertainty
on the state of the connection. The PoolingConnectionManager does staleness checks after a
connection has been idle for 2 seconds. This staleness checking involves writing into the
socket and observing what happens. Per documentation (https://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/client/config/RequestConfig.html),
it's an expensive procedure that can take up to 30ms, which in my opinion, completely negates
the performance optimization it's geared towards. After all, we might not be reusing many
connection anyway, if they go stale pretty quick due to pause/resume.

Furthermore, there is this problem (per: https://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html):

> HttpClient tries to mitigate the problem by testing whether the connection is 'stale',
that is no longer valid because it was closed on the server side, 
> prior to using the connection for executing an HTTP request. The stale connection check
is not 100% reliable. The only feasible solution that does not 
> involve a one thread per socket model for idle connections is a dedicated monitor thread
used to evict connections that are considered expired due to a 
> long period of inactivity. The monitor thread can periodically call ClientConnectionManager#closeExpiredConnections()
method to close all expired connections 
> and evict closed connections from the pool. It can also optionally call ClientConnectionManager#closeIdleConnections()
method to close all connections that 
> have been idle over a given period of time.

I think in our case this is a bit overengineered and propose the following:

We throw away all connections in the pool upon pausing the container. A connection will be
reestablished after resuming it. This will reuse connections under bursts (in the pause-grace)
but should safely recreate connections when pausing/unpausing. As I said above, this might
be pretty close the behavior we're having anyway, but makes it less prone to timing errors
and much more explicit.

This might well explain the strange issues we had when we first used akka-http in our invoker
-> user-container communication. @Tyson this would then have an impact on the work you're
doing.

Let me know what you think,
-m



Mime
View raw message