Return-Path: Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: (qmail 36746 invoked from network); 23 Aug 2010 17:39:03 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 23 Aug 2010 17:39:03 -0000 Received: (qmail 35243 invoked by uid 500); 23 Aug 2010 17:39:03 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 35237 invoked by uid 500); 23 Aug 2010 17:39:02 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 35229 invoked by uid 99); 23 Aug 2010 17:39:02 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Aug 2010 17:39:02 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Aug 2010 17:38:44 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o7NHcMTI009364 for ; Mon, 23 Aug 2010 17:38:23 GMT Message-ID: <8896332.512541282585102961.JavaMail.jira@thor> Date: Mon, 23 Aug 2010 13:38:22 -0400 (EDT) From: "Peter Schuller (JIRA)" To: commits@cassandra.apache.org Subject: [jira] Commented: (CASSANDRA-1014) GC storming, possible memory leak In-Reply-To: <32174055.140911271953191087.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901497#action_12901497 ] Peter Schuller commented on CASSANDRA-1014: ------------------------------------------- I have not read anything about this other than what is in this ticket, and the beginnings of this is old, so this may be moot, but a couple of things: * The first graph attached (1014-2Gheap.png) looks to me like the JVM is only doing young generation collections and is simply not ever doing a concurrent mark/sweep phase. That would be a VM bug (or broken VM options). * Is the 60 mb vs. 368 mb the difference between a CMS full collection and a stop-the-world full collection? I.e., it was 368 right after a full CMS sweep? It need not necessarily indicate a VM bug; consider that CMS's old gen is maintained in a non-compacting/copying fashion and that the CMS old gen is thus susceptible to fragmentation overhead. A full stop-the-world GC also applies, AFAIK, that it does a compacting GC. A factor of 6.1 seems like a lot though, but I don't know about how the CMS free space management works. If the 6.1 is explained by fragmentation, my initial guess would be that large allocations are the triggering factor. > GC storming, possible memory leak > --------------------------------- > > Key: CASSANDRA-1014 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1014 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 0.6 > Environment: debian lenny amd64 OpenJDK 64-Bit Server VM (build 1.6.0_0-b11, mixed mode) > Reporter: Brandon Williams > Fix For: 0.7.0 > > Attachments: 1014-2Gheap.png, 1014-commitlog-v2.tar.gz, 1014-table.diff, 724-0001.png, gc2.png > > > There appears to be a GC issue due to memory pressure in the 0.6 branch. You can see this by starting the server and performing many inserts. Quickly the jvm will consume most of its heap, and pauses for stop-the-world GC will begin. With verbose GC turned on, this can be observed as follows: > [GC [ParNew (promotion failed): 79703K->79703K(84544K), 0.0622980 secs][CMS[CMS-concurrent-mark: 3.678/5.031 secs] [Times: user=10.35 sys=4.22, real=5.03 secs] > (concurrent mode failure): 944529K->492222K(963392K), 2.8264480 secs] 990745K->492222K(1047936K), 2.8890500 secs] [Times: user=2.90 sys=0.04, real=2.90 secs] > After enough inserts (around 75-100 million) the server will GC storm and then OOM. > jbellis and I narrowed this down to patch 0001 in CASSANDRA-724. Switching LBQ with ABQ made no difference, however using batch mode instead of periodic for the commitlog does prevent the issue from occurring. The attached screenshot shows the heap usage in jconsole first when the issue is exhibiting, a restart, and then the same amount of inserts when it does not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.