tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Eggers <its_toas...@yahoo.com>
Subject Re: TC 6.0.20 Cleanup after application crash
Date Wed, 10 Aug 2011 22:23:55 GMT
----- Original Message -----

> From: Dante Bell <DantePasquale@cocoanet.us>
> To: Tomcat Users List <users@tomcat.apache.org>
> Cc: Christopher Schultz <chris@christopherschultz.net>
> Sent: Wednesday, August 10, 2011 11:26 AM
> Subject: Re: TC 6.0.20 Cleanup after application crash
> 
> Hi Chris,
> 
> I did indeed read and digest Mark's email and talked to the vendor about
> that issue. The stack trace on the old blog post is from the one Mark
> was helping out with (man, that was a really bad sentance!).
> 
> This is a different issue :( I don't have a stack trace and I don't have
> access to the lab they are running these tests in. I've requested the
> stack traces when this happens, but haven't received those yet.
> 
> Your question about 'crash' is valid and the explanation I received was
> that the load test application crashes. That's all I have at this time
> from them. I'm helping them from a dark, distant planet and only see the
> things they want me to see ;) Weirdly, it doesn't sound like TC is dead
> from what they are telling me, after 15 minutes it starts serving up db
> responses!
> 
> Yes, they are using mod_jk.
> 
> 
> 
> On 08/10/2011 12:55 PM, Christopher Schultz wrote:
>>  Dante,
>> 
>>  On 8/10/2011 11:57 AM, Dante Bell wrote:
>>  > We are seeing that after an application crash (customized load
>>  > tester with minimal error handling so it crashes often)
>> 
>>  When you say "crash", do you mean you get a stack trace in the 
> logs and
>>  Tomcat stays up, or do you mean that you bring-down the JVM? If you
>>  bring-down the JVM, what is the error that is occurring (check hs_*.txt
>>  files laying around in the working directory for that)?
>> 
>>  > that TC isn't releasing the connection for about 15 minutes.
>> 
>>  If TC is truly dead, then it's not holding connections at all. That
>>  would be the OS holding them.
>> 
>>  What makes you think they are not being "released"? What counts 
> as
>>  "released"?
>> 
>>  > I've reviewed some of the worker directives, but I'm really 
> unsure as
>>  > to which one or combination would shorten this interval
>>  > significantly.
>> 
>>  Does that mean you are using mod_jk/mod_proxy_ajp? Good to have that
>>  kind of information.
>> 
>>  > The Apache server still serves up static content, which makes me
>>  > think that there isn't anything at the OS or Apache layer that is
>>  > causing the connection to hang around (granted, this isn't an
>>  > absolute and we are investigating these 2 components also).
>> 
>>  So you're using Apache httpd, too. Also good to know.
>> 
>>  > We've done some minor TCP/IP tuning in the Solaris stack, and that
>>  > has helped with other issues regarding heavy loads.
>> 
>>  On Solaris.
>> 
>>  > If TC is the culprit, would we need to be setting the advanced
>>  > connector directives such as:
>> 
>>  > |recovery_options        |4: close the connection to Tomcat, if we
>>  > detect an error when writing back the answer to the client (browser)
>> 
>>  That depends upon what the errors actually are. Care to tell us about
>>  them?
>> 
>>  > PS. Configs can be found at: http://bit.ly/pFIzO0
>> 
>>  Sigh. You should look into "template" workers.
>> 
>>  Apache httpd MaxClients setting default is 256. <Connector> 
> MaxThreads
>>  is set to 750, so Tomcat should have almost 3 times more than you need.
>>  Where do you see 750 stuck threads?
>> 
>>  I looked at your thread dump. You clearly have not read Mark's previous
>>  response on this list where he told you exactly what was happening: your
>>  webapp is killing itself with these SingleThreadModel servlets. This is
>>  not thread starvation due to configuration, this is thread starvation
>>  due to a poorly-implemented web application.
>> 
>>  > Apache:* Apache HTTP Server Version 2.2 -- prefork with mpm *Tomcat:*
>>  > 6.0.20 *JK Connector:* Same as whatever is bundled in with Apache 2.2
>>  > (from customer) *Solaris* Solaris 10 10/09 s10s_u8wos_08a SPARC
>> 
>>  Aah, here's all the configuration information. Description then 
> context.
>>  Not the best term paper I've ever read. :(
>> 
>>  I think you mean "prefork MPM". Apache httpd does not bundle 
> mod_jk.
>>  Check your version.


As is my normal self, this will be horrifically long. I apologize for that in advance. Here
are the cliff notes first.

1. Clean up your httpd.conf - it's a mess
   Notes in the main message

2. Clean up your workers.properties - it's not a mess, but certainly missing things
   Notes and an example in the main message

3. Clean up your AJP Connector in server.xml - it's a mess
   Notes and an example in the main message

4. Use JMeter - well-tested, robust, freely available testing tool
   http://jakarta.apache.org/jmeter/

5. Fix the application - there really is no other viable solution

And now for the novel . . .

* Introduction

This will be a long and rambling set of comments on the entire
configuration. I will try to address issues as I see them. I will also
note missing information as I go.

I don't have any hard and fast solutions to the problems that are
being posted. However, a first order of business is to clean up the
existing issues as noted below. Once those issues are addressed, then
the underlying causes to the problems can be investigated.

In short, it's often very difficult to see the forest for the trees
when working with problems like this.

* The Platform

OS:      Solaris 10
JRE:     unknown
HTTPD:   2.2.17 prefork (the default on UNIX and Linux)
MOD_JK:  unknown
Tomcat:  6.0.20

First of all, it would be nice to know the versions of those listed as
"unknown". As has been noted in the mailing list, mod_jk does not
come with Apache HTTPD. Some of the configuration notes for
workers.properties depend on which version of mod_jk you are using.

HTTPD 2.2.17 is not horribly out of date. According to the web site,
2.2.19 is the latest released version. Issues that are addressed in
2.2.19 (actually, 2.2.18 which is abandoned) that may concern you are
as follows:

  *) Core HTTP: disable keepalive when the Client has sent Expect:
     100-continue but we respond directly with a non-100 response.
     Keepalive here led to data from clients continuing being treated
     as a new request.  PR 47087.  [Nick Kew]

  *) prefork: Update MPM state in children during a graceful restart.
     Allow the HTTP connection handling loop to terminate early during
     a graceful restart.  PR 41743.  [Andrew Punch <andrew.punch
     247realmedia.com>]

  *) mod_ssl: Correctly read full lines in input filter when the line
     is incomplete during first read. PR 50481. [Ruediger Pluem]

Tomcat 6.0.20 is out of date. The current version is 6.0.32, and I
imagine 6.0.33 will be out soon. I won't post the changelog here, but
there are many important fixes.

* Configurations

I will be a bit hamstrung in commenting about your
configurations. This is mainly due to the lack of information
concerning mod_jk. If you don't know the version, you may be able to
find out by doing the following:

strings mod_jk.so | grep mod_jk/

On my system (Fedora 15, kernel 2.6.40 - which is 3.0) this returns:

mod_jk/1.2.32 ()
mod_jk/1.2.32

** HTTPD Configuration

Since this is not the Apache HTTPD mailing list, I won't make a lot of
comments about the general configuration here. It is pretty much a
mess, and the maintainers of this need to clean it up before going
into production.

*** Defaults Used

ServerAdmin you@example.com
ServerName mycompany.com:80

These are the defaults and should be changed.

LoadModule proxy_module libexec/mod_proxy.so
LoadModule proxy_connect_module libexec/mod_proxy_connect.so
LoadModule proxy_ftp_module libexec/mod_proxy_ftp.so
LoadModule proxy_http_module libexec/mod_proxy_http.so
LoadModule proxy_scgi_module libexec/mod_proxy_scgi.so
LoadModule proxy_ajp_module libexec/mod_proxy_ajp.so
LoadModule proxy_balancer_module libexec/mod_proxy_balancer.so

If your server is not secured this is a security issue. Since you are
using mod_jk (see lines later in the configuration file), I can see no
reason to load proxy_ajp_module. I suspect that there is no reason to
load any of the proxy modules, but I've not gone through the
configuration carefully.

Interestingly enough, mod_proxy and mod_proxy_http are both commented
out later in the configuration file.

LoadModule dav_module libexec/mod_dav.so
LoadModule dav_fs_module libexec/mod_dav_fs.so

This allows (with proper configuration) remote users to edit files on
the server via the webdav protocol. I'm not sure you would want this
on a customer-facing web server. You may, and it seems to be enabled
here:

# Distributed authoring and versioning (WebDAV)
Include conf/extra/httpd-dav.conf

You don't have any prefork configuration, so you're using the
defaults. These are:

StartServers         5
MinSpareServers      5
MaxSpareServers      10
ServerLimit          256
MaxClients           256
MaxRequestsPerChild  10000

This means that the HTTPD server can handle 256 simultaneous
requests. You can read in the documentation what the other numbers
mean, but the names are pretty self-evident.

The 256 number is relevant to Connector element configuration. The
largest number of simultaneous connections this server can handle is
256. This means the largest number of requests that can be forwarded
to Tomcat at any one time is 256. This has an impact on your
server.xml file as noted below.

Finally, there is a lot of SSL configuration in httpd.conf, but
mod_ssl is commented out. 

*** mod_jk configuration

I'm only going to comment in detail lines that are uncommented in the
httpd.conf file. There are a lot of other issues that I'll just
mention.

1. There are many lines that perform the same forwarding function

For example:

JkMount /MyCfg/servlet/* worker1

This would include

JkMount /MyCfg/servlet/Login worker1

2. If all of your workers go to the same host and port (which means
   the same Tomcat), why are there multiple workers configured?

The above lines (and others like it) look suspiciously like the
application is using the Invoker servlet. By default this is disabled
in Tomcat 6 due to security concerns.

Since the web application was written with NetBeans (I recognize the
doProcess() method), there is no reason to not map the servlets to
appropriate URLs in web.xml.

Please post $CATALINA_HOME/conf/web.xml with comments removed.

Stripping down everything, your current mod_jk configuration looks
like the following.

JkWorkersFile      "/mycompany/apps/myfm/fmserver/Tomcat/conf/workers.properties"
JkLogFile          /usr/apache2_cgems/logs/mod_jk.log 
JkLogLevel         error 
JkLogStampFormat   "[%a %b %d %H:%M:%S %Y] "
JkOptions          +ForwardKeySize +ForwardURICompat -ForwardDirectories 
JkRequestLogFormat "%w %V %T"

JkMount  /ACT worker2
JkMount  /ACT/* worker2

A couple of quick comments here.

You don't have JkShmFile, jk-status, or jk-manager configured. This is
useful to see what's going on with mod_jk.

There is no need for quotes around the JkWorkersFile name.

Since workers.properties is a mod_jk configuration (and part of Apache
HTTPD), I normally put this with all of the other Apache HTTPD
configuration files (/etc/httpd/conf.d on Fedora 15).

The JkLogStampFormat is the default for mod_jk prior to 1.2.24, so I'm
going to guess that your mod_jk may actually be 1.2.23 or older. If
so, time to upgrade. See the notes above on one way to determine this.

-ForwardDirectories is the default.
+ForwardKeySize is the default.
+ForwardURICompat was the default until mod_jk 1.2.22

From the documentation at
http://tomcat.apache.org/connectors-doc/reference/apache.html, this is
less spec compliant and not safe if you are using prefix
JkMount. Apparently this means if you don't map to exact URLs, then
this option results in unsafe operation.

** workers.properties

Since the only worker you are using in httpd.conf is worker2, then the
following is sufficient.

# Minimal jk configuration
worker.list=worker2
worker.worker2.type=ajp13
worker.worker2.host=localhost
worker.worker2.port=8019

However, a more explicit configuration may be desired. This all
depends on your version of mod_jk. A while back I posted a
workers.properties file to the list in answer to another question. An
abbreviated version of that is shown below.

worker.list=worker2
#
# template
#
# Notes on configuration
# type                   - ajp13 which is the protocol and the default
# socket_connect_timeout - in milliseconds (what happens when Tomcat
#                          is started later?
# socket_keepalive       - send keep alive packets when connection is
#                          idle
# ping                   - how to do the keep alive (see
#                          documentation)
# ping_timeout           - default in milliseconds
# minsize                - minimum pool size - drops to zero after a
#                          while
# timeout                - pool timeout should match AJP connector in
#                          Tomcat. Note time here is in seconds and
#                          must match the AJP connector in
#                          server.xml. Note, there is no timeout by
#                          default in server.xml
# reply_timeout          - timeout for a reply. The default is no
#                          timeout. The value is in milliseconds. Make
#                          longer than the longest Tomcat will process
#                          a request, otherwise an error will be
#                          returned.
# recovery_options       - a bitmapped flag for recovery when a
#                          request is successfully sent but no reply
#                          is received. 0 is the default, 3 says don't
#                          retry on another backend

worker.template.type=ajp13
worker.template.host=localhost
worker.template.socket_connect_timeout=5000
worker.template.socket_keepalive=true
worker.template.ping_mode=A
worker.template.ping_timeout=10000
worker.template.connection_pool_minsize=0
worker.template.connection_pool_timeout=600
worker.template.reply_timeout=300000
worker.template.recovery_options=3

#
# now to define the actual workers
#
worker.worker2.reference=worker.template
worker.worker2.port=8019

This is based on the configurations found in
tomcat-connectors-[version]-src/conf. I think this started appearing
in version 1.2.31. That's the earliest version I have unpacked on my
system at any rate.

One thing to note here. The connection_pool_timeout must be the same
as the timeout value for the AJP connector in server.xml. The value
here is in seconds. The value in server.xml is in milliseconds.

I do not understand why you have the other workers configured. They
all go to the same host. Apache HTTPD will only open 256 connections
(max) by default. I cannot think of a reason why you don't just have
one worker per Tomcat.

** server.xml

I will just comment on the portion that has to do with the AJP
connections. Note that I have a much longer connection pool timeout
than you do, and will be changing the connectionTimeout value
accordingly.

    <Connector port="8019"
       connectionTimeout="10000"
       maxThreads="750"
       minSpareThreads="20"
       maxSpareThreads="50"
       request.TomcatAuthentication="false"
       protocol="AJP/1.3"
       redirectPort="8445" />

There are several issues here that need to be addressed.

1. connectionTimeout="10000"

This must match the pool_timeout in workers.properties, so in this
example it should be 600000.

2. maxThreads="750"

In your current HTTPD configuration, you can never have more than 256
connections from HTTPD to Tomcat. The default value is 200. Since you
said that Apache HTTPD also serves some static content, leaving this
at the default is probably a good idea.

3. minSpareThreads, maxSpareThreads

I don't see either of these in the Tomcat 6 documentation.

4. request.TomcatAuthentication="false"

According to the documentation if you do not want Tomcat to process
authentication (and it appears this way from your Apache HTTPD
configuration), the directive is tomcatAuthentication="false"

5. Encoding

By default, the URIEncoding is set to ISO-8859-1. You might wish to
change that to UTF-8.

Applying the above changes to your AJP connector configuration (and
reflecting the 600 second timeout in workers.properties), the
following Connector element is arrived at.

    <Connector port="8019"
       connectionTimeout="600000"
       tomcatAuthentication="false"
               URIEncoding="UTF-8"
       protocol="AJP/1.3"
       redirectPort="8445" />

* Load Test Tool Crash

I really cannot comment on this since it's a custom built tool. Are
there reasons for not using something like JMeter?

* Other Application Issues [Soapbox below]

Over the weekend I wrote a quick Single Thread Model servlet and poked
around with JMX. I didn't see any way to tell what was going on
without doing a thread dump. Once you reach the limit of 20 STM threads, I'm not
sure what you would do. Would you kill one or more threads? How? Which
one would you choose? If you could kill a thread running the STM
servlet, how would you tell Tomcat that there's another slot available
for another STM thread? What state would Tomcat end up in if you could
kill off a thread running an STM servlet?

In short, fix the application. STM servlets provide a false sense of
thread safety at any rate. STM does not protect context attributes
from modification by other servlets. Session variables are probably
also not thread safe (one browser, two tabs?).

I suspect that the original authors were trying to get around the
non-idempotent nature of POSTs. This plus the possible use of the
Invoker servlet leads me to believe that this is an old application
ripe for a rewrite.

. . . . just my nickel (since it's a long post)
/mde/


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message