tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rainer Jung <rainer.j...@kippdata.de>
Subject Re: mod_jk 1.2.28 errors
Date Thu, 28 Oct 2010 10:08:58 GMT
On 26.10.2010 01:05, Hannaoui, Mo wrote:
> 1. When there are>  30 http connections, I see the error below almost
> every 1 minute. As the traffic and the number of connections increase,
> the frequency of error increases and the performance of the web
> application that is being hosted on the system decreases.
>
>
>
> [Mon Oct 25 20:59:42 2010][11224:3086337808] [info]
> ajp_process_callback::jk_ajp_common.c (1882): Writing to client aborted
> or client network problems

This error tells us, that during sending the response back to the 
browser a problem was detected. Most likely a connction abort or similar.

This will happen every now and then when users do not wait for a 
response and instead proceed clicking on other links. If it happens to 
often, then it might indicate that either your application is not 
responsive enough, so users have a reason to start clicking while 
waiting, or you might have an infrastrutural problem on the way back to 
the browser.

Note that the messages are only flagged as "[info]", because as said 
occasional occurence is not problematic.

If you want to decie, whether this is happening due to bad performance, 
you should

- add "%P %{tid}P %D" to your LogFormat for the Apache access log. This 
will log the process id, the thread id (for prefork MPM that's always 
"1") and the duration in microseconds. You can use the pid and tid to 
correlate with the jk log messages. In the jk log line it is 
"[11224:3086337808]", the irst number is the pid, the second the tid.

Note that the timestamp in the access log is when the request started, 
the time stamp in the JK log is when the response was detected as 
broken. The delta should be roughly what is being loged as %D. Choose a 
couple of occurances, find the counterparts in the access log and see, 
whether they tok especially long. You can also look at what are the 
URLs, the user agents, the client IPs etc., all via the access log.

- add an access log to Tomcat and do not forget to add %D to the log 
pattern as well. Check whether the same, likely long running requests 
also take long according to the Tomcat access log. Note that %D for 
Tomcat logs milliseconds, not microseconds like for Apache.

If you find many examples, where Tomcat logs a short time and Apache a 
long time, then you likely have a network/firwall/load-balancer/whatever 
problem between Apache and the browser. Especially if file sizes are not 
huge. In that case Tomcat will be able to stream back to Apache, which 
will be able to put all of the response in the TCP send buffer, but 
Apache will nevertheless log the error, if the content finally can not 
be transmitted.

- next you can start sniffing to find out, what actually was the root 
case from the point of view of Apache, e.g. whether a reset was sent by 
the client. I did run into cases, where security devices every now and 
then reset foreign connections for which they thought they looked like 
an attack. Easy to detect with a network sniff: in that case the MAC 
address from which the reset was sent was different form the mac address 
that sent the rest of the connection packets.

- finally you can try to work your way close to the browser by doing 
sniffs further up th enetwork.

> 2. The number of connections will suddenly surge from say 40 to 90 to
> ~200 in no time, at which point all I see in mod_jk.log is error
> messages and the application either stops responding with the connection
> refused or bad gateway error. To fix the problem the Jboss service
> usually needs to be restated. This surge is unpredictable and may happen
> between 1 and 5 times in 24 hours.

This indicates a prformance problem with the app (or GC problems).
Observed concurrance is roughly:

concurrancy = requests per second * average response time

If the concurrancy spikes, it is usualy actually the response time that 
spikes. Add "%D" to the access logs to verify.

If so, start doing Java thread dumps to analyze what's happening in 
JBoss. Also look at per thread CPU load using ps to check, whether there 
are special thrads that take to much CPU. Finally check GC activity.

> I have read many posts and documents (including
> http://kbase.redhat.com/faq/docs/DOC-15866 and used
> http://lbconfig.appspot.com/ for base configurations) and changed the
> configurations many times, but the problem continues to exist. I think
> my current configuration is the worst version so far. It works well only
> with low traffic.
>
> Here's the current configuration:
>
> --- workers.properties ----
...


> worker.template.reply_timeout=30000

Might be a bit short. Check with your %D logged values. Please do also 
add max_reply_timeouts to your load balancer.

...

> worker.template.socket_timeout=10

I personally don't like the general socket_timeout. I do like the more 
fine-grained individual timeouts.

The source download of mod_jk 1.2.30 contains a well-documented example 
configuration (1.2.28 does not). Further "official" notes about timeouts 
are available at:

http://tomcat.apache.org/connectors-doc/generic_howto/timeouts.html

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message