Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DF975D3CA for ; Fri, 12 Oct 2012 08:27:07 +0000 (UTC) Received: (qmail 18330 invoked by uid 500); 12 Oct 2012 08:27:05 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 18154 invoked by uid 500); 12 Oct 2012 08:27:05 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 18125 invoked by uid 99); 12 Oct 2012 08:27:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Oct 2012 08:27:04 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of daniel.y.woo@gmail.com designates 209.85.217.172 as permitted sender) Received: from [209.85.217.172] (HELO mail-lb0-f172.google.com) (209.85.217.172) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Oct 2012 08:26:57 +0000 Received: by mail-lb0-f172.google.com with SMTP id k13so2109256lbo.31 for ; Fri, 12 Oct 2012 01:26:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=LnFdFzqE8clroKvmMQ71tW8duaSs1m3uBuAeimBkXig=; b=gr2P43e6CpiEOe043tyUgYIQdxFudsDVfzNq7dSu+llOEMnRLhoHonmgglB7Ozs4sh 7XTe6qq1afJDaHJyh/eXiChyG31bXe4Dd3aymtJs3Kp4u+2ZfpJ6YuEsFhp/GS2tWf2K 0u3/0xlDMF30DCyZgQPIRAsWGMc7MCKfcSyvXdvQw8PiEtpiHgXxgTmKDTtVn0UXwo3z rYNP+y7ck7TFOgChF/LQIS3ewcLbiICV8NcHPfBfK2RtOce7AaLzKoqyYzDC+AbLcF+G 5nIjx2SiEdiTwdlHVWpIM2nmUfd3jqNuuEBHMO8xbXPVdkwdOLcOSYBRYTjtPgngv1J5 g2gg== MIME-Version: 1.0 Received: by 10.112.50.106 with SMTP id b10mr1369932lbo.51.1350030397036; Fri, 12 Oct 2012 01:26:37 -0700 (PDT) Received: by 10.114.11.69 with HTTP; Fri, 12 Oct 2012 01:26:37 -0700 (PDT) In-Reply-To: References: Date: Fri, 12 Oct 2012 16:26:37 +0800 Message-ID: Subject: Re: cassandra 1.0.8 memory usage From: Daniel Woo To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=f46d0401fe5940592c04cbd872b9 --f46d0401fe5940592c04cbd872b9 Content-Type: text/plain; charset=UTF-8 Hi Rob, >>What version of Cassandra? What JVM? Are JNA and Jamm working? cassandra 1.0.8. Sun JDK 1.7.0_05-b06, JNA memlock enabled, jamm works. >>It sounds like the two nodes that are pathological right now have exhausted the perm gen with actual non-garbage, probably mostly the Bloom filters and the JMX MBeans. JMAP shows that the per gen is only 40% used. >>Do you have a "large" number of ColumnFamilies? How large is the data stored per node? I have very few column families, maybe 30-50. The nodetool shows each node has 5 GB load. >> Disable swap for cassandra node I am gonna change swappiness to 20% Thanks, Daniel On Fri, Oct 12, 2012 at 2:02 AM, Rob Coli wrote: > On Wed, Oct 10, 2012 at 11:04 PM, Daniel Woo > wrote: > > I am running a mini cluster with 6 nodes, recently we see very frequent > > ParNewGC on two nodes. It takes 200 - 800 ms on average, sometimes it > takes > > 5 seconds. You know, hte ParNewGC is stop-of-wolrd GC and our client > throws > > SocketTimeoutException every 3 minutes. > > What version of Cassandra? What JVM? Are JNA and Jamm working? > > > I checked the load, it seems well balanced, and the two nodes are > running on > > the same hardware: 2 * 4 cores xeon with 16G RAM, we give cassandrda 4G > > heap, including 800MB young generation. We did not see any swap usage > during > > the GC, any idea about this? > > It sounds like the two nodes that are pathological right now have > exhausted the perm gen with actual non-garbage, probably mostly the > Bloom filters and the JMX MBeans. > > > Then I took a heap dump, it shows that 5 instances of JmxMBeanServer > holds > > 500MB memory and most of the referenced objects are JMX mbean related, > it's > > kind of wired to me and looks like a memory leak. > > Do you have a "large" number of ColumnFamilies? How large is the data > stored per node? > > =Rob > > -- > =Robert Coli > AIM>ALK - rcoli@palominodb.com > YAHOO - rcoli.palominob > SKYPE - rcoli_palominodb > -- Thanks & Regards, Daniel --f46d0401fe5940592c04cbd872b9 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Rob,

>>What version of Cassandra? What JVM? Are JNA and Jam= m working?
cassandra 1.0.8. Sun JDK 1.7.0_05-b06, JNA memlock enabled, j= amm works.

>>It sounds like the two nodes that are pathologica= l right now have exhausted the perm gen with actual non-garbage, probably m= ostly the=C2=A0 Bloom filters and the JMX MBeans.
JMAP shows that the per gen is only 40% used.

>>Do you have a = "large" number of ColumnFamilies? How large is the data stored pe= r node?
I have very few column families, maybe 30-50. The nodetool shows= each node has 5 GB load.

>> Disable swap for cassandra node
I am gonna change swappines= s to 20%
=C2=A0
Thanks,
Daniel


On Fri, Oct 12, 2012 at 2:02 AM, Rob Coli <rcoli@palominodb.com= > wrote:
On Wed, Oct 10, 2012 at 11= :04 PM, Daniel Woo <daniel.y.w= oo@gmail.com> wrote:
> I am running a mini cluster with 6 nodes, recently we see very frequen= t
> ParNewGC on two nodes. It takes 200 - 800 ms on average, sometimes it = takes
> 5 seconds. You know, hte ParNewGC is stop-of-wolrd GC and our client t= hrows
> SocketTimeoutException every 3 minutes.

What version of Cassandra? What JVM? Are JNA and Jamm working?

> I checked the load, it seems well balanced, and the two nodes are runn= ing on
> the same hardware: 2 * 4 cores xeon with 16G RAM, we give cassandrda 4= G
> heap, including 800MB young generation. We did not see any swap usage = during
> the GC, any idea about this?

It sounds like the two nodes that are pathological right now have
exhausted the perm gen with actual non-garbage, probably mostly the
Bloom filters and the JMX MBeans.

> Then I took a heap dump, it shows that 5 instances of JmxMBeanServer h= olds
> 500MB memory and most of the referenced objects are JMX mbean related,= it's
> kind of wired to me and looks like a memory leak.

Do you have a "large" number of ColumnFamilies? How large i= s the data
stored per node?

=3DRob

--
=3DRobert Coli
AIM&GTALK - rcoli@palominodb.co= m
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb



--
Thanks &a= mp; Regards,
Daniel
--f46d0401fe5940592c04cbd872b9--