tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: Notification strategy for OutOfMemoryError
Date Tue, 10 Dec 2013 09:21:59 GMT
Christopher Schultz wrote:
> Hash: SHA256
> Bill,
> On 12/9/13, 5:38 PM, Bill Davidson wrote:
>> Last week, one of my servers got an OutOfMemoryError at
>> approximately 1:21pm.
> :(
> It's worth pointing out that this is not a trivial issue.
>> My monitoring software which does a heart beat check once per
>> minute did not notice until 3:01pm.  Heart beat kept working for
>> over an hour and a half.
> Was it a transient error, or a chronic condition? A single thread can,
> for instance, spew objects into its stack or eden space exhausting
> memory but, when that thread hits the OOME, all those objects are
> freed which basically recovers from the situation.
> If, instead, you fill-up some shared cache, buffer, etc. and NO
> threads can get more memory, then you're basically toast.
> Which of the above was it?
>> During that time my high capacity high availablity 24/7 application
>> was getting occasional OutOfMemoryError's until memory got bad
>> enough that even the heart beat check servlet failed.  Apparently
>> some things that allocate large chunks of memory started failing
>> first, but none of my customers called to complain.  Smaller stuff
>> continiued to work.  I didn't know until my monitoring software
>> sent me an email about the heart beat failure.
>> That doesn't work for me.  I need to know sooner.
> +1
>> I thought of trying to handle it with error-page in web.xml.
>> Apparently that does not work.  I used java.lang.Throwable as the
>> exception-type. I was already using this for a number of common
>> exceptions to send me email.
> In most OOME situations, your recovery options are limited... because
> the JVM might need to allocate (a small amount of) memory in order to
> even report the error.
>> I see the OutOfMemoryError's logged in my catalina.out
>> Is there some way that I can catch this so that I can send email or
>> something?  I need to know as soon as possible so that I can 
>> attempt diagnosis and restart the server.  Google has not been
>> helpful. Everything says that you have to fix the memory leak.
>> Duh.  I know that. We've fixed many over the years.  We haven't had
>> one in nearly 2 years. We thought we'd fixed them all.  We need to
>> find out about them sooner when they do happen.
> There are a bunch of things you can try to do. They all have their
> caveats, failure scenarios, and inefficacies.
> 1. Use -XX:OnOutOfMemoryError="cmd args;cmd args"
> Rig this to email you, register a passive-check data point with your
> monitoring server, etc. Just remember that OOMEs happen for a number
> of reasons. You could have run out of file handles or you could have
> run out of heap space.
> 2. Use JMX monitoring, set java.lang:MemoryPool/[heap
> space]/UsageThreshold to whatever byte value you want to set as your
> limit. Then, check java.lang:MemoryPool/[heap
> space]/UsageThresholdExceeded to see if it is true. If so, your usage
> threshold has been exceeded.
> Note that this is not proof-positive than an OOME occurred. It's also
> tough to tell what value to use for the threshold. You can't really
> set it to MaxHeap - 1 byte, because you'll never get that value in
> practice. If you set it too low, you'll get warnings all the time when
> your heap usage rises in the normal course of business.
> 3. catch IOException in a filter and set an application attribute.
> Check this attribute from your monitor.
> I've been considering doing this, because I can rig it so that the
> error handler does not actually require any memory to run. The problem
> is that sometimes OOMEs interrupt one thread and not another. You may
> not catch the OOME in that thread -- it may happen in a background
> thread that does not go through the filter.
> 4. You can do what I do: simply look at your total heap space by
> inspecting java.lang:Memory/HeapMemoryUsage["used"] and set a
> threshold that will cause your monitor to alarm for WARNING and
> CRITICAL conditions. You may recover and not have to check anything.
> These days, I get a false-alarm about once every 3 weeks when the heap
> space grows a hair higher than usual before a full GC runs and clears
> everything out.
> The nice thing about #4 is that you can find our early if you *might*
> be having a problem. Then you can keep an eye on your service to make
> sure it "recovers". If it never OOME's, great. If it does, you can
> manually restart or whatever. If it OOME's, and #1-#3 above fail
> because memory might be required to actually execute the
> do-this-thing-on-OOME action, then you might never get notified. With
> #4, you don't have to wait until an OOME to take action.

Here is another discussion of the matter :

and another :

Based on :
 >> I see the OutOfMemoryError's logged in my catalina.out
If so, can't you "pipe" your catalina.out through a program that will inspect each line 
(in real-time), and when it sees such a line, immediately send a signal somewhere ?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message