tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carl" <>
Subject Re: Tomcat dies suddenly
Date Wed, 24 Feb 2010 11:08:00 GMT

Thanks for the thoughts.

My first thought was that the problem was hardware related and that the 
reason I could not see the problem was that the memtest86 did not 
sufficiently stress the machine to change the temperature enough to cause a 
failure.  Subsequently, I built up another server with entirely different 
architecture (AMD vs Intel, different memory, different disks, etc.) and it 
failed in exactly the same manner.  I have added memory to this second 
server just to test that we were not running out of  memory by some fluke 
but those tests failed in exactly the same manner.  My conclusion is that 
the problem is not hardware but rather either the Sun JVM (the only one I 
have used) or an errant piece of native code somewhere.  I have tried to 
find the errant code by reducing the machine to the absolute minimum 
required to run the applicxation and that also showed the same failure. 
Chris suggested using strace (which I have) but I inadvertently overwrote 
the file containing the failure (not one of my brighter moves.)


----- Original Message ----- 
From: "George Sexton" <>
To: "'Tomcat Users List'" <>
Sent: Tuesday, February 23, 2010 7:45 PM
Subject: RE: Tomcat dies suddenly

> -----Original Message-----
> From: Carl []
> Sent: Tuesday, February 23, 2010 5:09 AM
> To: Tomcat Users List
> Subject: Re: Tomcat dies suddenly
> Just an update.
> After 8 1/2 days, on the newly built Slackware machine with the JRE in
> the
> Slackware distribution removed bebore installing the operating system
> and
> using the newest version of the mysql-connector, the system failed in
> exactly the same fashion as the previous attempts: ran beautifully
> right up
> to the point of failure and the failure was the JVM being stopped with
> a
> reported seg fault.
> Changed this server to the IBM JVM.  Tested it locally (directly
> accessed
> the IP within the DMZ) and it worked great.  Switched it to production
> early
> this morning (4:30AM before people start coming onto the system) and
> everything seemed good.  Then, specific customers (the rest were able
> to
> come in just fine) starting getting 404's (we use only https, didn't
> have a


Just out of curiosity, have you tried building out machines with DIFFERENT 
hardware. E.G. building out a server using an IBM or HP computer, rather 
than than the ones you already have. If I recall correctly, you started this 
thread out with SIG 11's.

SIG 11's on Linux are quite often hardware problems. I know you've done 
memtest, but sometimes that's not enough. Here's a link to a problem I had:

To make a long story short, there was random disk corruption that was 
happening. When I stopped using the on-board controller and went to a PCI 
one, the computer would reboot itself under heavy load. Some of the static 
burn-in utilities can miss hardware defects because they don't actually 
stress the system. E.G. power, CPU, disks, etc. The problem was a specific 
rev of a specific motherboard.

I think you need to step back, get a computer from a different manufacturer 
and test. You've tried different OS's, different JVMs, different everything, 
but different hardware. By your own admission, the app used run flawlessly 
on an older server.

When you've eliminated everything else, the thing that remains, however 
unlikely must be the culprit.

George Sexton
MH Software, Inc.
Voice: 303 438 9585

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message