tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Mikusa <dmik...@vmware.com>
Subject Re: Help in diagnosing server unresponsiveness
Date Wed, 20 Feb 2013 15:35:52 GMT
On Feb 20, 2013, at 3:52 AM, Zoran Avtarovski wrote:

> Hi Guys,
> 
> It's been a while but the nature of this problem means it may be a while
> between crashes. But we just had a big one which hung the system and
> required a reboot.

Can you elaborate more on this?  What OS are you running?  What do you mean by "hung the system"?
 Did you get a kernel panic / Bsod?  

> I have changed the tomcat options as follows inline with all the advice
> and material I read to be as follows:

This can be dangerous.  Especially, if you haven't tested the settings and verified that they
help to increase performance and lower GC overhead for your system and applications.  Applications
are unique and what works to tune one of them may not work well for others.

> 
> -server -Xms1460m -Xmx11460m -Djava.awt.headless=true
> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
> -XX:MaxPermSize=512M -XX:NewSize=4500m -XX:+CMSClassUnloadingEnabled
> -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit
> -XX:CMSInitiatingOccupancyFraction=80 -verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps -Xloggc:/usr/local/tomcat/logs/gc.log

First, what JVM are you running?  vendor and version.  If you are running anything but the
latest version of that JVM, upgrade to the latest version.  See if the problem is still present.

Some comments on your JVM options...

1.) You have -XX:+UseConcMarkSweepGC listed twice

2.) You have -XX:+CMSIncrementalMode, does the following describe your system?  If not, remove
this setting.

"This feature is useful when applications that need the low pause times provided by the concurrent
collector are run on machines with small numbers of processors (e.g., 1 or 2)."

  http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#icms

3.) I'm not a fan of specifying "-XX:NewSize=4500m".  I think the JVM's default usually works
fine, plus it's difficult to manually specify this value and get it correct.  My suggestion
would be to remove this option, unless you have load tested your application with and without
the setting and you can 100% guarantee that it is helping performance.

4.) You have set -XX:-UseGCOverheadLimit, which could be dangerous.  "This feature is designed
to prevent applications from running for an extended period of time while making little or
no progress because the heap is too small."

  http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#cms.oom

Disabling this would seem unnecessary if your JVM options are tuned correctly.

5.) This option, -XX:CMSInitiatingOccupancyFraction, is another one where I would suggest
using the JVM default.  Unless you have load tested with and without the setting and can guarantee
that setting this value improves performance.

> 
> The garbage collection log had the following details just prior to the
> crash:
> 
> 4163.757: [GC [1 CMS-initial-mark: 0K(5376K)] 1834200K(4152576K),
> 1.9237250 secs] [Times: user=1.92 sys=0.00, real=1.92 secs]
> 4165.682: [CMS-concurrent-mark-start]
> 4165.834: [CMS-concurrent-mark: 0.152/0.152 secs] [Times: user=0.15
> sys=0.00, real=0.16 secs]
> 4165.834: [CMS-concurrent-preclean-start]
> 4165.849: [CMS-concurrent-preclean: 0.015/0.015 secs] [Times: user=0.01
> sys=0.00, real=0.01 secs]
> 4165.849: [CMS-concurrent-abortable-preclean-start]
> CMS: abort preclean due to time 4171.285:
> [CMS-concurrent-abortable-preclean: 5.035/5.436 secs] [Times: user=5.05
> sys=0.00, real=5.44 secs]
> 4171.285: [GC[YG occupancy: 1834200 K (4147200 K)]4171.286: [Rescan
> (parallel) , 1.5184720 secs]4172.804: [weak refs processing, 0.0001420
> secs]4172.804: [class unloading, 0.0118860 secs]4172.816: [scrub symbol &
> string tables, 0.0141570 secs] [1 CMS-remark: 0K(5376K)]
> 1834200K(4152576K), 1.5484470 secs]
> 
> 	
> And the JavaMelody monitoring indicated the crash occurred at the same
> time as garbage collection took place. Basically the Garbage collector
> time chart peaked at 20 and ran for about 15minutes.
> 
> 
> I has a look at the garbage collector chart over a longer period and when
> the collector runs more frequently it appears to be more stable.
> 
> Any advice on where to go next?

1.) Look at the "Basic Troubleshooting" section here.

   http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#icms.troubleshooting

2.) If possible, take some heap dumps when you start to notice a problem.  Then you can analyze
them with a profiler and see what is happening in the heap.

3.) Load test with a profiler hooked directly up to your application.  Try to recreate the
problem.

Hope that helps.

Dan



> 
> 
> Z.
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message