tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott McClanahan <scott.mcclana...@trnswrks.com>
Subject Re: mod_jk error detection
Date Wed, 25 Jul 2007 18:49:11 GMT
On Wed, 2007-07-25 at 17:00 +0200, Rainer Jung wrote:

> Hi,
> 
> good questions. First of all: I just today wrote a new docs page about 
> timeouts. We are soon releasing 1.2.24 which contains this page. You can 
> already look at it under
> 
> http://people.apache.org/~rjung/mod_jk-dev/docs/
> 
> (The new page is named "Timeouts" and part of the group Generic Howtos.
> 
> Also the new docs contain a better explanation, what retries means, 
> especially the huge difference between retries for an lb worker and a 
> usual worker. This info is on the updated workers.properties page in the 
> reference guide.
> 
> > With these settings how could I expect the connector to behave if:
> > 
> > 1.  Tomcat dies and the port is no longer listening resulting in an
> > immediate icmp response.
> 
> I would expect, that any attempt to use an existing connection or to 
> open a new one immediately returns with an error, because the remote 
> machine rejects the communication. Further JK behaviour is now depending 
> if you are using a load balancer or not. Se retries etc. in the updated 
> docs.
> 
> > 2.  The box hosting tomcat dies or the tcp stack for whatever reason
> > tanks resulting in no immediate icmp response.
> 
> As long as your local system or the last router still has an arp entry 
> for the died machine, you will run into very long TCP timeouts. We 
> recommend CPing/CPong, see the new Timeouts page.
> 
> > 3.  The connector does make a successful connection to the backend
> > tomcat worker only to have that worker become slow and almost
> > unresponsive.
> 
> You should use CPing/CPong and reply timeouts. See again the new 
> Timeouts page. If you don't use an lb, the best you can do is throwing 
> an error early, such that the rest of the infrastructure doesnt get 
> congested.
> 
> > Are there more directives I should be concerned with?  Currently, I have
> > no intentions on monitoring the http response status codes to detect
> > errors.
> 
> Look at the new page and look at the workers.properties page of the 
> reference guide. Use a load balancing worker, set recovery_options etc.
> 
> HTH.
> 
> Regards,
> 
> Rainer
> 
> P.S.: If you have suggestions how to improve the new page: it's not 
> public yet. If you are fast enough, we can include those changes.
> 
> ---------------------------------------------------------------------
> To start a new topic, e-mail: users@tomcat.apache.org
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
> 
> 


I thoroughly enjoyed the updated docs.  It is just what I needed.  I
just want to mention a few inferences I have now from reading it.

In a load balanced setup using connect_timeout and prepost_timeout, this
will protect me from sending either newly established connections (rare
event due to persistence) as well as each and every individual request
from being sent to a failed tomcat node based on CPING/CPONG messages.
These messages only detect whether or not the container (I'm using
tomcat) is healthy enough to respond to such a message but not
necessarily anything more, correct?  Basically, its ajp listener is
responsive.  Plus, if I need more high speed error detection I can use
reply_timeout.  Sound correct?

I get confused on the recovery_options section.  How does it work in a
load balanced environment?  If tomcat receives a request and processes
some of it followed by a catastrophic failure before completing the
response, what exactly does a repeated request from the client do?
Assuming recovery_options is set to 0.

Also, I get confused with the section describing the retries directive.
In a load balanced environment, would the connector retry no matter the
state (tcp state here) of the connection whether it be established
already?  Would it retry against the same backend tomcat server?  The
reason I ask is because the docs say "If the load balancer can not get a
free connection for a member worker from the pool, it will try again a
number of times given by retries." I highlighted the words that confuse
me.

Every 60 seconds would we expect the connector to attempt to send a
valid request to a backend tomcat and fail or once a worker goes into
error state do we only check with CPING/CPONG requests during the
maintenance cycle?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message