tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carl" <>
Subject Re: Tomcat dies suddenly
Date Fri, 12 Feb 2010 12:20:43 GMT
This problem continues to plague me.

A quick recap so you don't have to search your memory or archives.

The 10,000 foot view:  new Dell T105 and T110, Slackware 13.0 (64 bit), 
latest Java (64 bit) and latest Tomcat.  Machines only run Tomcat and a 
small, special purpose Java server (which I have also moved to another 
machine to make certain it wasn't causing any problems.)  Periodically, 
Tomcat just dies leaving no tracks in any log that I have been able to find. 
The application has run on a Slackware 12.1 (32 bit) for several years 
without problems (except for application bugs.)  I have run memTest86 for 30 
hours on the T110 with no problems reported.

More details: the Dell 105 has an AMD processor and (currently) 8 GB memory. 
The T110 has a Xeon 3440 processor and 4 GB memory.  The current Java 
version is 1.6.0_18-b07.  The current Tomcat version is 6.0.24.

The servers are lightly loaded with less than 100 sessions active at any one 

All of the following trials have produced the same results:

1.  Tried openSuse 64 bit.

2.  Tried 32 bit Slackware 13.

3.  Increased the memory in the T105 from 4GB to 6 GB and finally to 8 GB.

4.  Have fiddled with the JAVA_OPTS settings in  The current 
settings are:

JAVA_OPTS="-Xms512m -Xmx512m -XX:PermSize=384m -XX:MaxPermSize=384m -XX:+UseConcMarkSweepGC

 -XX:+CMSIncrementalMode -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError


I can see the incremental GC effects in both catalina.out and VisualJVM. 
Note the fairly small (512MB) heap but watching VisualJVM indicates this is 
sufficient (when a failure occurs, VisualJVM will report the last amount of 
memory used and this is always well under the max in both heap and permGen.)

More information about the failures:

1.  They are clean kills as I can restart Tomcat immediately after failure 
and there is no port conflict.  As I understand it, this implies the linux 
process was killed (I have manually killed the java process with kill -9 and 
had the same result that I have observed when the system fails) or Tomcat 
was shut down normally, e.g., using (this always leaves tracks 
in catalina.out and I am not seeing any so I do not believe this is the 

2.  They appear to be load related.  On heavy processing days, the system 
might fail every 15 minutes but it could also run for up to 10 days without 
failure but with lighter processing.  I have found a way to force a more 
frequent failure.  We have four war's deployed (I will call them A, B, C and 
D.)  They are all the same application but we use this process to enable 
access to different databases.  A user accesses the correct application by or B, etc.  A is used for production while the others have 
specific purposes.  Thus, A is always used while the others are used 
periodically.  If users start coming in on B, C and/or D, within hours the 
failure occurs (Tomcat shuts down bringing all of the users down, of 
course.)  Note that the failure still does not happen immediately.

3.  They do not appear to be caused by memory restrictions as 1) the old 
server had only 2 GB of memory and ran well, 2) I have tried adding memory 
to the new servers with no change in behavior and 3) the indications from 
top and the Slackware system monitor are that the system is not starved for 
memory.  In fact, yesterday, running on the T105 with 8 GB of memory, top 
never reported over 6 GB being used (0 swap being used) yet it failed at 
about 4:00PM.

4.  Most of the failures will occur after some amount of processing.  We 
update the war's and restart the Tomcats each morning at 1:00AM.  Most of 
the failures will occur toward the end of the day although heavy processing 
(or using multiple 'applications') may force it to happen earlier (the 
earliest failure has been around 1:00PM... it was the heaviest processing 
day ever.)  It is almost as if there is a bucket somewhere that gets filled 
up and, when filled, causes the failure.  (So there is no misunderstanding, 
there has never been an OOM condition reported anywhere that I can find.)

Observations (or random musings):

The fact that the failures occur after some amount of processing implies 
that the issue is related to memory usage, and, potentially, caused by a 
memory leak in the application.  However, 1) I have never seen (from 
VisualJVM) any issue with either heap or permGen and the incremental GC's 
reported in catalina.out look pretty normal and 2) top, vmstat, system 
monitor, etc. are not showing any issues with memory.

The failures look a lot like the linux OOM killer (which Mark or Chris said 
way back at the beginning which is now 2-3 months ago.)   Does anyone have 
an idea where I could get information on tracking the linux signals that 
could cause this?



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message