tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Asankha C. Perera" <asan...@apache.org>
Subject Re: Handling requests when under load - ACCEPT and RST vs non-ACCEPT
Date Wed, 07 Nov 2012 04:01:27 GMT
Hi Chris
> My expectation from the backlog is:
>
> 1. Connections that can be handled directly will be accepted and work
> will begin
>
> 2. Connections that cannot be handled will accumulate in the backlog
>
> 3. Connections that exceed the backlog will get "connection refused"
>
> There are caveats, I would imagine. For instance, do the connections in
> the backlog have any kind of server-side timeouts associated with them
> -- what is, will they ever get discarded from the queue without ever
> being handled by the bound process (assuming the bound process doesn't
> terminate or anything weird like that)? Do the clients have any timeouts
> associated with them?
>
> Does the above *not* happen? On which platform? Is this only with NIO?
I am not a Linux level TCP expert, but what I believe is that the TCP 
layer has its timeouts and older connection requests will get discarded 
from the queue etc. Typically a client will have a TCP level timeout as 
well, i.e. the time it will wait for the other party to accept its SYN 
packet. My testing has been primarily on Linux / Ubuntu.

Leaving everything to the TCP backlog makes the end clients see nasty 
RSTs when Tomcat is under load instead of connection refused - and could 
prevent the client from performing a clean fail-over when one Tomcat 
node is overloaded.
> So you are eliminating the backlog entirely? Or are you allowing the
> backlog to work as "expected"? Does closing and re-opening the socket
> clear the existing backlog (which would cancel a number of waiting
> though not technically accepted connections, I think), or does it retain
> the backlog? Since you are re-binding, I would imagine that the backlog
> gets flushed every time there is a "pause".
I am not sure how the backlog would work under different operating 
systems and conditions etc. However, the code I've shared shows how a 
pure Java program could take better control of the underlying TCP 
behavior - as visible to its clients.
> What about performance effects of maintaining a connector-wide counter
> of "active" connections, plus pausing and resuming the channel -- plus
> re-connects by clients that have been dropped from the backlog?
What the UltraESB does by default is to stop accepting new connections 
after a threshold is reached (e.g. 4096) and remain paused until the 
active connections drops back to another threshold (e.g. 3073). Each of 
these parameters are user configurable, and depends on the maximum 
number of connections each node is expected to handle. Maintaining 
connector wide counts in my experience does not cause any performance 
effects, neither re-connects by clients - as whats expected in reality 
is for a hardware load balancer to forward requests that are "refused" 
by one node, to another node, which hopefully is not loaded.

Such a fail-over can take place immediately, cleanly and without any 
cause of confusion even if the backend service is not idempotent. This 
is clearly not the case when a TCP/HTTP connection is accepted and then 
met with a hard RST after a part or a full request has been sent to it.
> I'm concerned that all of your bench tests appear to be done using
> telnet with a single acceptable connection. What if you allow 1000
> simultaneous connections and test it under some real load so we can see
> how such a solution would behave.
Clearly the example I shared was just to illustrate this with a pure 
Java program. We usually conduct performance tests over half a dozen 
open source ESBs with concurrency levels of 20,40,80,160,320,640,1280 
and 2560 and payload sizes of 0.5, 1, 5, 10 and 100K bytes. You can see 
some of the scenarios here http://esbperformance.org. We privately 
conduct performance tests beyond 2560 to much higher levels. We used a 
HttpComponents based EchoService as our backend service all this time, 
and it behaved very well with all load levels. However some weeks back 
we accepted a contribution which was an async servlet to be deployed on 
Tomcat as it was considered more "real world". The issues I noticed was 
when running high load levels over this servet deployed on Tomcat, 
especially when the response was being delayed to simulate realistic 
behavior.

Although we do not Tomcat ourselves, our customers do. I am also not 
calling this a bug - but as an area for possible improvement. If the 
Tomcat users, developers and the PMC thinks this is worthwhile to 
pursue, I believe it would be a good enhancement - maybe even a good 
GSoc project. As a fellow member of the ASF and a committer on multiple 
projects/years, I believed it was my duty to bring this to the attention 
of the Tomcat community.

regards
asankha

-- 
Asankha C. Perera
AdroitLogic, http://adroitlogic.org

http://esbmagic.blogspot.com




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message