tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rainer Jung <rainer.j...@kippdata.de>
Subject Re: BIO performance issues
Date Wed, 04 May 2011 14:20:36 GMT
I hope the following is not too long and confusing ...

On 03.05.2011 22:02, Mark Thomas wrote:
> Scenario
> --------
> This ended up being very long, so I moved it to the end. The exact
> pattern of delays will vary depending on timeouts, request frequency
> etc. but the scenario shows an example of how delays can occur. The
> short version is that requests with data to process (particularly new
> connections) tend to get delayed in the queue waiting for a thread to
> process them when the threads are all tied up processing keep-alive
> connections.
>
> Root cause
> ----------
> The underlying cause of all of the performance issues observed is when
> the threads are tied up doing HTTP keep-alive when there is no data
> process but there are other connections in the queue that do have data
> that could be processed.
>
> Solution A
> ----------
> NIO is designed to handle this using a poller. That isn't available to
> BIO so I attempted to simulate it. That generated excessive CPU load so
> I do not think simulated polling is the tight solution.

I expect generating the SocketTimeoutException is expensive, because the 
JVM has to generate the stack information. The rate of the Exception 
when handling mostly keep-alive (extreme case) is your "poll" timeout 
times the number of threads, e.g. 100ms timeout times 200 threads is 
2000 exceptions per second. Even if there is another reason for the high 
CPU load, I expect it to be roughly proportional to the poll rate. In a 
saturated system with lots of keep-alive you will have:

pollRate = 1 / pollTimeout * maxThreads
(e.g. 1 / 0.1s * 200 = 2000/s)
averageWaitBeforePoll = maxConnection / pollRate / 2
(e.g. 10000 / 2000 / 2 / s = 1.5s)

So we see, that in your case though we already have a high poll event 
rate, we end up with every connection only being polled every 2.5 
seconds, which is too much of request latency. If we want to reduce this 
latency, we would need to increse the rate. But then CPU gets even 
worse. Or we need to reduce maxConnections.

Let us try a different sizing:

maxThreads 200 , maxConnections 1000 (less overcommitment, but still 
very different from 200), pollTimeout 200ms.

rate = 1000, half of the previous rate due to the doubled timeout.
averageWaitBeforePoll = 0.5 seconds.

Although this is an improvement, we still have a high poll rate and even 
0.5 seconds average wait time for new connections isn't nice.

The tradeoff is: To be CPU effective, we have to reduce the poll rate. 
Assuming a fixed thread and connection count, this automatically leads 
to longer averageWaitBeforePoll, i.e. request latency. There seems to be 
no sweet spot for sizing the system.

If we do not find an efficient way (in terms of CPU and blocking time of 
threads) to handle the keep-alive connections, then I don't expect a 
solution to the problem - except for disabling keep-alive or not 
accepting much more connections than we have threads. At the end that's 
the 75% threads busy then disable keep-alive solution. One could throw 
in some "reduce keep-alive timeout under load" feature, but I doubt it 
will help much more than the simply solution.

Do we see a cpu time and thread blocking time efficient way of handling 
many keep-alive connections? I don't see any API, that would help here. 
Of course one could try to build a hyprid "blocking for normal 
processing but non-blocking for keep-alive" thing, but since we already 
have NIO I would also support recommending NIO for keep-alive.

Switching the default from BIO to NIO is a big change, but only after we 
switch will we find the last buglets and problems arising under rare 
conditions. So if we want to switch, we should do it very soon. Doing it 
late in the TC 7 cycle would be bad.

Lastly: APR uses server to server connections, as does HTTP when using a 
reverse proxy in front of Tomcat. In those cases we have much fewer 
connections with a higher rate of requests per connection. There 
maxThreads == maxConnections is fine (and even the 75% rule could be 
switched off). So for this scenario it would be nice to not drop BIO, at 
least until the major TC version after the default switched to NIO.

> Solution B
> ----------
> Return to the Tomcat 6 implementation where maxConnections == maxThreads.
>
> Additional clean-up
> -------------------
> maxConnections is unnecessary in APR since pollerSize performs the same
> function.
>
> Summary
> -------
> The proposed changes are:
> a) restore disabling keep-alive when threads used>= 75% of maxThreads
> b) remove maxConnections and associated code from the APR connector
> c) remove the configuration options for maxConnections from the BIO
> connector
> d) use maxThreads instead of maxConnections for the BIO connector
> e) update the docs

I agree (especially after your additional clarifications in reply to 
Konstantin).

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Mime
View raw message