Return-Path: X-Original-To: apmail-tomcat-users-archive@www.apache.org Delivered-To: apmail-tomcat-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 65F5810F19 for ; Mon, 9 Dec 2013 23:12:33 +0000 (UTC) Received: (qmail 65799 invoked by uid 500); 9 Dec 2013 23:12:28 -0000 Delivered-To: apmail-tomcat-users-archive@tomcat.apache.org Received: (qmail 65432 invoked by uid 500); 9 Dec 2013 23:12:27 -0000 Mailing-List: contact users-help@tomcat.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Tomcat Users List" Delivered-To: mailing list users@tomcat.apache.org Received: (qmail 65422 invoked by uid 99); 9 Dec 2013 23:12:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Dec 2013 23:12:27 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [76.96.59.227] (HELO qmta12.westchester.pa.mail.comcast.net) (76.96.59.227) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Dec 2013 23:12:22 +0000 Received: from omta24.westchester.pa.mail.comcast.net ([76.96.62.76]) by qmta12.westchester.pa.mail.comcast.net with comcast id zW5B1m0021ei1Bg5CbC1Z1; Mon, 09 Dec 2013 23:12:01 +0000 Received: from Christophers-MacBook-Pro.local ([68.55.8.89]) by omta24.westchester.pa.mail.comcast.net with comcast id zbC11m0081vFKdg3kbC1TM; Mon, 09 Dec 2013 23:12:01 +0000 Message-ID: <52A64E40.6040105@christopherschultz.net> Date: Mon, 09 Dec 2013 18:12:00 -0500 From: Christopher Schultz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.1.1 MIME-Version: 1.0 To: Tomcat Users List Subject: Re: Notification strategy for OutOfMemoryError References: <52A64675.1030600@gmail.com> In-Reply-To: <52A64675.1030600@gmail.com> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1386630721; bh=gtAwKqDVGOqSH4xSOJykuo9p1gUIl+zk+z1IDkrqrT4=; h=Received:Received:Message-ID:Date:From:MIME-Version:To:Subject: Content-Type; b=kM+JhxO5jM0ikRKTXJbFPszUZ53imqE71MejIjEwa6fg2llc35+qxfB+t0TGiOKuw ENRRTF67LXNi1FQTBdyy3rbyoXfPJEj0evv44CPR/SwWE2PYWVfxcouqDOQkHHo94y d0P1R7/V49s+tqrNfyPrvuYEE3lyj0LHIt97QH+flsCoswOYu8a+UrbYVIDEc0XSs9 cGUpw2qNg6j51DalX4R4oXfvsHdcM+cbWsrBjyafYtMFwofeztWRahkuX0y3hyFPr1 BJh52cWld7Myzwzby77lqYHc4h+dQFjg/bg1CcfRXT228TEV3wRiPu2wm0nzl0tVEN jlCH2nCNwwbkA== X-Virus-Checked: Checked by ClamAV on apache.org -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Bill, On 12/9/13, 5:38 PM, Bill Davidson wrote: > Last week, one of my servers got an OutOfMemoryError at > approximately 1:21pm. :( It's worth pointing out that this is not a trivial issue. > My monitoring software which does a heart beat check once per > minute did not notice until 3:01pm. Heart beat kept working for > over an hour and a half. Was it a transient error, or a chronic condition? A single thread can, for instance, spew objects into its stack or eden space exhausting memory but, when that thread hits the OOME, all those objects are freed which basically recovers from the situation. If, instead, you fill-up some shared cache, buffer, etc. and NO threads can get more memory, then you're basically toast. Which of the above was it? > During that time my high capacity high availablity 24/7 application > was getting occasional OutOfMemoryError's until memory got bad > enough that even the heart beat check servlet failed. Apparently > some things that allocate large chunks of memory started failing > first, but none of my customers called to complain. Smaller stuff > continiued to work. I didn't know until my monitoring software > sent me an email about the heart beat failure. > > That doesn't work for me. I need to know sooner. +1 > I thought of trying to handle it with error-page in web.xml. > Apparently that does not work. I used java.lang.Throwable as the > exception-type. I was already using this for a number of common > exceptions to send me email. In most OOME situations, your recovery options are limited... because the JVM might need to allocate (a small amount of) memory in order to even report the error. > I see the OutOfMemoryError's logged in my catalina.out > > Is there some way that I can catch this so that I can send email or > something? I need to know as soon as possible so that I can > attempt diagnosis and restart the server. Google has not been > helpful. Everything says that you have to fix the memory leak. > Duh. I know that. We've fixed many over the years. We haven't had > one in nearly 2 years. We thought we'd fixed them all. We need to > find out about them sooner when they do happen. There are a bunch of things you can try to do. They all have their caveats, failure scenarios, and inefficacies. 1. Use -XX:OnOutOfMemoryError="cmd args;cmd args" Rig this to email you, register a passive-check data point with your monitoring server, etc. Just remember that OOMEs happen for a number of reasons. You could have run out of file handles or you could have run out of heap space. 2. Use JMX monitoring, set java.lang:MemoryPool/[heap space]/UsageThreshold to whatever byte value you want to set as your limit. Then, check java.lang:MemoryPool/[heap space]/UsageThresholdExceeded to see if it is true. If so, your usage threshold has been exceeded. Note that this is not proof-positive than an OOME occurred. It's also tough to tell what value to use for the threshold. You can't really set it to MaxHeap - 1 byte, because you'll never get that value in practice. If you set it too low, you'll get warnings all the time when your heap usage rises in the normal course of business. 3. catch IOException in a filter and set an application attribute. Check this attribute from your monitor. I've been considering doing this, because I can rig it so that the error handler does not actually require any memory to run. The problem is that sometimes OOMEs interrupt one thread and not another. You may not catch the OOME in that thread -- it may happen in a background thread that does not go through the filter. 4. You can do what I do: simply look at your total heap space by inspecting java.lang:Memory/HeapMemoryUsage["used"] and set a threshold that will cause your monitor to alarm for WARNING and CRITICAL conditions. You may recover and not have to check anything. These days, I get a false-alarm about once every 3 weeks when the heap space grows a hair higher than usual before a full GC runs and clears everything out. The nice thing about #4 is that you can find our early if you *might* be having a problem. Then you can keep an eye on your service to make sure it "recovers". If it never OOME's, great. If it does, you can manually restart or whatever. If it OOME's, and #1-#3 above fail because memory might be required to actually execute the do-this-thing-on-OOME action, then you might never get notified. With #4, you don't have to wait until an OOME to take action. - -chris -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCAAGBQJSpk4+AAoJEBzwKT+lPKRYsCIP/0XZ/v8njibLl1ECnpByBagB jtqCeE78lsHdWouoW7ydIpgmSP60KqvHtMemQUoS3STpn52ahNv/hf8imnybgByv smtTxq0cbFNsnHqJiUb/VQtyK5bnqW7u+mLxwvvt1uIwHUoX5QyTZCUBQqvbUuDM JRexqlFZIGzoiXLNUc5Z+Lg36IBZ8xO6/wlC014GQJTtbc71TS06gxTOKNDNTyuO T4SGsvqdzHAIvnJ77XbDpRmFv0wPMiwCJhCCD/ZLQ+WKbn+MVa5MHsjBbdHT8PZp ggk/haWCYhu8wzE3gs1gfC4gvwNkLHiGXUe3smrV0QiGSb4wjUGHEI0LRthRvPP2 tl92yrrjE3jKBgEwS7Bh51btf7sP+fOmuUczKIKKhhC17H3+Pxy/uQYm+kplTQl4 9n09f9IobQH1diafqAanrKer8p4uNq2Q9OK06nwwRWWV/Fe9zqRXJMViozjmbqQB Bw2uSIAEAvEAhQteo4h+1oObrLxzAp1VUFo5J8y/tZqxc04sv3uoM3NIXZlSKUii ulc98SCL6zZJmjflSPWqvgGebTYbpvJT46dkQ3lFkMjjjsJQP2J6Wh3ySzsJ48eS KH6knpkEwQe/IhRrXPn7bDGO1/92Je5IFZcVQI2vtxD2DUzNDViTyamCO5HSJEDx ZjZkTpmZ+PPsXgmfaTGX =uZkt -----END PGP SIGNATURE----- --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org For additional commands, e-mail: users-help@tomcat.apache.org