tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carl" <c...@etrak-plus.com>
Subject Re: Tomcat dies suddenly
Date Fri, 12 Feb 2010 19:47:29 GMT
Tony,

I tried stressing it with JMeter and came up no results.  I could push it 
hard enough to force an OOM but it performed/failed as expected leaving 
tracks all over the place.  The stressing was not very sophisticated (just a 
couple of the production jsp's) but, like I said, it didn't show anything (I 
was really testing to see if the problem was in GC... it wasn't.)  Might rig 
up a more comprehensive test... will see after I try Chris and Peter's 
ideas.

Thanks,

Carl

----- Original Message ----- 
From: <anthonyvierra@gmail.com>
To: "Tomcat Users List" <users@tomcat.apache.org>
Sent: Friday, February 12, 2010 12:07 PM
Subject: Re: Tomcat dies suddenly


> Is it possible to run this server with a basic tomcat application under 
> load
> to rule out the application causing the crash?
>
> On Fri, Feb 12, 2010 at 4:20 AM, Carl <carl@etrak-plus.com> wrote:
>
>> This problem continues to plague me.
>>
>> A quick recap so you don't have to search your memory or archives.
>>
>> The 10,000 foot view:  new Dell T105 and T110, Slackware 13.0 (64 bit),
>> latest Java (64 bit) and latest Tomcat.  Machines only run Tomcat and a
>> small, special purpose Java server (which I have also moved to another
>> machine to make certain it wasn't causing any problems.)  Periodically,
>> Tomcat just dies leaving no tracks in any log that I have been able to 
>> find.
>> The application has run on a Slackware 12.1 (32 bit) for several years
>> without problems (except for application bugs.)  I have run memTest86 for 
>> 30
>> hours on the T110 with no problems reported.
>>
>> More details: the Dell 105 has an AMD processor and (currently) 8 GB
>> memory. The T110 has a Xeon 3440 processor and 4 GB memory.  The current
>> Java version is 1.6.0_18-b07.  The current Tomcat version is 6.0.24.
>>
>> The servers are lightly loaded with less than 100 sessions active at any
>> one time.
>>
>> All of the following trials have produced the same results:
>>
>> 1.  Tried openSuse 64 bit.
>>
>> 2.  Tried 32 bit Slackware 13.
>>
>> 3.  Increased the memory in the T105 from 4GB to 6 GB and finally to 8 
>> GB.
>>
>> 4.  Have fiddled with the JAVA_OPTS settings in catalina.sh.  The current
>> settings are:
>>
>> JAVA_OPTS="-Xms512m -Xmx512m -XX:PermSize=384m -XX:MaxPermSize=384m
>> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=/usr/local/tomcat/logs"
>>
>> I can see the incremental GC effects in both catalina.out and VisualJVM.
>> Note the fairly small (512MB) heap but watching VisualJVM indicates this 
>> is
>> sufficient (when a failure occurs, VisualJVM will report the last amount 
>> of
>> memory used and this is always well under the max in both heap and 
>> permGen.)
>>
>> More information about the failures:
>>
>> 1.  They are clean kills as I can restart Tomcat immediately after 
>> failure
>> and there is no port conflict.  As I understand it, this implies the 
>> linux
>> process was killed (I have manually killed the java process with kill -9 
>> and
>> had the same result that I have observed when the system fails) or Tomcat
>> was shut down normally, e.g., using shutdown.sh (this always leaves 
>> tracks
>> in catalina.out and I am not seeing any so I do not believe this is the
>> case.)
>>
>> 2.  They appear to be load related.  On heavy processing days, the system
>> might fail every 15 minutes but it could also run for up to 10 days 
>> without
>> failure but with lighter processing.  I have found a way to force a more
>> frequent failure.  We have four war's deployed (I will call them A, B, C 
>> and
>> D.)  They are all the same application but we use this process to enable
>> access to different databases.  A user accesses the correct application 
>> by
>> https://xx.com/A or B, etc.  A is used for production while the others
>> have specific purposes.  Thus, A is always used while the others are used
>> periodically.  If users start coming in on B, C and/or D, within hours 
>> the
>> failure occurs (Tomcat shuts down bringing all of the users down, of
>> course.)  Note that the failure still does not happen immediately.
>>
>> 3.  They do not appear to be caused by memory restrictions as 1) the old
>> server had only 2 GB of memory and ran well, 2) I have tried adding 
>> memory
>> to the new servers with no change in behavior and 3) the indications from
>> top and the Slackware system monitor are that the system is not starved 
>> for
>> memory.  In fact, yesterday, running on the T105 with 8 GB of memory, top
>> never reported over 6 GB being used (0 swap being used) yet it failed at
>> about 4:00PM.
>>
>> 4.  Most of the failures will occur after some amount of processing.  We
>> update the war's and restart the Tomcats each morning at 1:00AM.  Most of
>> the failures will occur toward the end of the day although heavy 
>> processing
>> (or using multiple 'applications') may force it to happen earlier (the
>> earliest failure has been around 1:00PM... it was the heaviest processing
>> day ever.)  It is almost as if there is a bucket somewhere that gets 
>> filled
>> up and, when filled, causes the failure.  (So there is no 
>> misunderstanding,
>> there has never been an OOM condition reported anywhere that I can find.)
>>
>> Observations (or random musings):
>>
>> The fact that the failures occur after some amount of processing implies
>> that the issue is related to memory usage, and, potentially, caused by a
>> memory leak in the application.  However, 1) I have never seen (from
>> VisualJVM) any issue with either heap or permGen and the incremental GC's
>> reported in catalina.out look pretty normal and 2) top, vmstat, system
>> monitor, etc. are not showing any issues with memory.
>>
>> The failures look a lot like the linux OOM killer (which Mark or Chris 
>> said
>> way back at the beginning which is now 2-3 months ago.)   Does anyone 
>> have
>> an idea where I could get information on tracking the linux signals that
>> could cause this?
>>
>> Thanks,
>>
>> Carl
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: users-help@tomcat.apache.org
>>
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message