Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DFD57D24E for ; Thu, 18 Oct 2012 18:06:40 +0000 (UTC) Received: (qmail 61275 invoked by uid 500); 18 Oct 2012 18:06:38 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 61002 invoked by uid 500); 18 Oct 2012 18:06:37 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 60972 invoked by uid 99); 18 Oct 2012 18:06:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Oct 2012 18:06:36 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of btalbot@aeriagames.com designates 74.125.149.244 as permitted sender) Received: from [74.125.149.244] (HELO na3sys009aog118.obsmtp.com) (74.125.149.244) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 18 Oct 2012 18:06:28 +0000 Received: from mail-gh0-f198.google.com ([209.85.160.198]) (using TLSv1) by na3sys009aob118.postini.com ([74.125.148.12]) with SMTP ID DSNKUIBFDqaFgXmC865jUUTLBGwIZNgHOsOW@postini.com; Thu, 18 Oct 2012 11:06:07 PDT Received: by mail-gh0-f198.google.com with SMTP id g21so17998961ghb.9 for ; Thu, 18 Oct 2012 11:06:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aeriagames.com; s=google; h=mime-version:date:message-id:subject:from:to:content-type; bh=oEcT5sZ54ScCcLr7Ft8afitg0VzvUhH1R2G9RpXTxiY=; b=bLj8AknLNDqVDYUu85jMSr7sKvaItIXNV3Dwppiaa6JSSFtHPEWoicBSuHM4eBBfJS 9DA6BY+1gvMlH8HGf9fTEDfCqpSv2FqzgMKvHvoL0AdPswou8flvL5DaD19Nwn9mKEX5 y4mP56u+bObOxglsGo/LBXShMw9AS/2BRFMO4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type :x-gm-message-state; bh=oEcT5sZ54ScCcLr7Ft8afitg0VzvUhH1R2G9RpXTxiY=; b=NcYfeiDYGfBgEWhO2fNr+fws3sgbNpxkeGNsTFxT4aeWMYmxhEsZ+PsyIVgNDpOJnu 9niY6zwTZjTobndAP1apswySEwTYnIUwMJPCa8Xq+zy9gyVKRwApYWurs6ti97RQ8hgk N+mPgHISeWrMRAlP1my+phUAAlq/Oe2zlnYKOGFMAU2rJN8d/luk3C0gBJpY8/pIlhgo PYA/KQb6kaRWQZBKSjYHwE7mUEvcixDez5Qi6CtlgJV6j6nQsSBx+SGuZGmmNd4er2xg f7DP7R4Nbv9tL60EaHNauOU5q+2k6ENPGIPnsHdNrAcdBZxk4lYOHWgiL2EZXhQ4g6cI LAHQ== Received: by 10.52.95.34 with SMTP id dh2mr13200003vdb.69.1350583566399; Thu, 18 Oct 2012 11:06:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.95.34 with SMTP id dh2mr13199997vdb.69.1350583566279; Thu, 18 Oct 2012 11:06:06 -0700 (PDT) Received: by 10.58.15.195 with HTTP; Thu, 18 Oct 2012 11:06:06 -0700 (PDT) Date: Thu, 18 Oct 2012 11:06:06 -0700 Message-ID: Subject: constant CMS GC using CPU time From: Bryan Talbot To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=20cf307cfef4b5226704cc593d80 X-Gm-Message-State: ALoCoQkJGegLa3gzWl+K9QisDSKFVLpYyjQIVuRoLd7OowP2ap46cW4s8XnuKIEa6xajZ+dAy/BlH/zbanMaULvWblXBOWanAIfd1rTKkmkMpwpCCa/zyoMoWo0Le7qf20SxRH8Z1+P3KWlj2hnNyJe0NmkiVKZ9D8ubfo0mxDuiKq4EjyHky/s= X-Virus-Checked: Checked by ClamAV on apache.org --20cf307cfef4b5226704cc593d80 Content-Type: text/plain; charset=UTF-8 In a 4 node cluster running Cassandra 1.1.5 with sun jvm 1.6.0_29-b11 (64-bit), the nodes are often getting "stuck" in state where CMS collections of the old space are constantly running. The JVM configuration is using the standard settings in cassandra-env -- relevant settings are included below. The max heap is currently set to 5 GB with 800MB for new size. I don't believe that the cluster is overly busy and seems to be performing well enough other than this issue. When nodes get into this state they never seem to leave it (by freeing up old space memory) without restarting cassandra. They typically enter this state while running "nodetool repair -pr" but once they start doing this, restarting them only "fixes" it for a couple of hours. Compactions are completing and are generally not queued up. All CF are using STCS. The busiest CF consumes about 100GB of space on disk, is write heavy, and all columns have a TTL of 3 days. Overall, there are 41 CF including those used for system keyspace and secondary indexes. The number of SSTables per node currently varies from 185-212. Other than frequent log warnings about "GCInspector - Heap is 0.xxx full..." and "StorageService - Flushing CFS(...) to relieve memory pressure" there are no other log entries to indicate there is a problem. Does the memory needed vary depending on the amount of data stored? If so, how can I predict how much jvm space is needed? I don't want to make the heap too large as that's bad too. Maybe there's a memory leak related to compaction that doesn't allow meta-data to be purged? -Bryan 12 GB of RAM in host with ~6 GB used by java and ~6 GB for OS and buffer cache. $> free -m total used free shared buffers cached Mem: 12001 11870 131 0 4 5778 -/+ buffers/cache: 6087 5914 Swap: 0 0 0 jvm settings in cassandra-env MAX_HEAP_SIZE="5G" HEAP_NEWSIZE="800M" # GC tuning options JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC" JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC" JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled" JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8" JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1" JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75" JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly" JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops" jstat shows about 12 full collections per minute with old heap usage constantly over 75% so CMS is always over the CMSInitiatingOccupancyFraction threshold. $> jstat -gcutil -t 22917 5000 4 Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT 132063.0 34.70 0.00 26.03 82.29 59.88 21580 506.887 17523 3078.941 3585.829 132068.0 34.70 0.00 50.02 81.23 59.88 21580 506.887 17524 3079.220 3586.107 132073.1 0.00 24.92 46.87 81.41 59.88 21581 506.932 17525 3079.583 3586.515 132078.1 0.00 24.92 64.71 81.40 59.88 21581 506.932 17527 3079.853 3586.785 Other hosts not currently experiencing the high CPU load have a heap less than .75 full. $> jstat -gcutil -t 6063 5000 4 Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT 520731.6 0.00 12.70 36.37 71.33 59.26 46453 1688.809 14785 2130.779 3819.588 520736.5 0.00 12.70 53.25 71.33 59.26 46453 1688.809 14785 2130.779 3819.588 520741.5 0.00 12.70 68.92 71.33 59.26 46453 1688.809 14785 2130.779 3819.588 520746.5 0.00 12.70 83.11 71.33 59.26 46453 1688.809 14785 2130.779 3819.588 --20cf307cfef4b5226704cc593d80 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
In a 4 node cluster running Cassandra 1.1.5 with sun jvm=C2=A01.6.0_29= -b11 (64-bit), the nodes are often getting "stuck" in state where= CMS collections of the old space are constantly running. =C2=A0
=
The JVM configuration is using the standard settings in cassandra-env = -- relevant settings are included below. =C2=A0The max heap is currently se= t to 5 GB with 800MB for new size. =C2=A0I don't believe that the clust= er is overly busy and seems to be performing well enough other than this is= sue. =C2=A0When nodes get into this state they never seem to leave it (by f= reeing up old space memory) without restarting cassandra. =C2=A0They=C2=A0t= ypically enter this state while running "nodetool repair -pr" but= once they start doing this, restarting them only "fixes" it for = a couple of hours.

Compactions are completing and are=C2=A0generally=C2=A0= not queued up. =C2=A0All CF are using STCS. =C2=A0The busiest CF consumes a= bout 100GB of space on disk, is write heavy, and all columns have a TTL of = 3 days. =C2=A0Overall, there are 41 CF including those used for system keys= pace and secondary indexes. =C2=A0The number of SSTables per node currently= varies from 185-212.

Other than frequent log warnings about "GCInspector =C2=A0- Heap is 0.xxx full..." and "StorageService =C2= =A0- Flushing CFS(...) to relieve memory pressure" there are no= other log entries to indicate there is a problem.

Does the memory needed vary depending on the amount of = data stored? =C2=A0If so, how can I predict how much jvm space is needed? = =C2=A0I don't want to make the heap too large as that's bad too. = =C2=A0Maybe there's a memory leak related to compaction that doesn'= t allow meta-data to be purged?


-Bryan


12 GB of RAM in host with ~6 GB used by java and ~6 GB for OS and bu= ffer cache.
$>=C2=A0free= -m
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0total =C2=A0 =C2=A0 =C2=A0 used =C2=A0 =C2=A0 =C2=A0 free = =C2=A0 =C2=A0 shared =C2=A0 =C2=A0buffers =C2=A0 =C2=A0 cached
=
Mem: =C2=A0 =C2=A0 =C2=A0 =C2=A0= 12001 =C2=A0 =C2=A0 =C2=A011870 =C2=A0 =C2=A0 =C2=A0 =C2=A0131 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A04 =C2=A0 =C2=A0= =C2=A0 5778
-/+ buffers/cache: =C2=A0 =C2=A0= =C2=A0 6087 =C2=A0 =C2=A0 =C2=A0 5914
Swap: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00


jv= m settings in cassandra-env
= MAX_HEAP_SIZE=3D"5G"
HEAP_NEWSIZE=3D"800M"

# GC tuning options
JVM_OPTS=3D"$JVM_OPTS -XX:+UseP= arNewGC"=C2=A0
JVM_OPTS=3D"$JVM_OPTS -XX:+= UseConcMarkSweepGC"=C2=A0
JVM_OPTS=3D"$JVM_OPTS -XX:+CMSParallelRemarkEnabled"= =C2=A0
JVM_OPTS=3D"$JVM_OPTS -XX:S= urvivorRatio=3D8"=C2=A0
JVM_OPTS=3D"$JVM_OPTS -XX:MaxTenuringThreshold=3D1"
JVM_OPTS=3D"$JVM_OPTS -XX:CMSIni= tiatingOccupancyFraction=3D75"
JVM_OPTS=3D"$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOn= ly"
JVM_OPTS=3D"$JVM_OPTS -XX:+= UseCompressedOops"


<= /font>
jstat shows about 12 full collections per minute with old heap usage consta= ntly over 75% so CMS is always over the CMSInitiatingOccupancyFraction thre= shold.

$> = jstat -gcutil -t 22917 5000 4
Timestamp =C2=A0 =C2=A0 =C2=A0 =C2=A0 S0 =C2=A0 =C2=A0 S1 =C2=A0 =C2= =A0 E =C2=A0 =C2=A0 =C2=A0O =C2=A0 =C2=A0 =C2=A0P =C2=A0 =C2=A0 YGC =C2=A0 = =C2=A0 YGCT =C2=A0 =C2=A0FGC =C2=A0 =C2=A0FGCT =C2=A0 =C2=A0 GCT =C2=A0=C2= =A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0132063.0 =C2=A034.70 =C2=A0 0.00 = =C2=A026.03 =C2=A082.29 =C2=A059.88 =C2=A021580 =C2=A0506.887 17523 3078.94= 1 3585.829
=C2=A0 =C2=A0 =C2=A0 =C2=A0132068.0 =C2=A034.70 =C2=A0= 0.00 =C2=A050.02 =C2=A081.23 =C2=A059.88 =C2=A021580 =C2=A0506.887 17524 3= 079.220 3586.107
=C2=A0 =C2=A0 =C2=A0 =C2=A0132073.1 =C2=A0 0.00 =C2=A024.92 =C2=A046.8= 7 =C2=A081.41 =C2=A059.88 =C2=A021581 =C2=A0506.932 17525 3079.583 3586.515=
=C2=A0 =C2=A0 =C2=A0 =C2=A0132078.1 =C2=A0 0.00 =C2=A024.92 =C2= =A064.71 =C2=A081.40 =C2=A059.88 =C2=A021581 =C2=A0506.932 17527 3079.853 3= 586.785


Other hosts not currently experiencing the high CPU loa= d have a heap less than .75 full.

$> jstat -gcutil -t 6063 5000 4
Timestamp =C2=A0 =C2=A0 =C2=A0 = =C2=A0 S0 =C2=A0 =C2=A0 S1 =C2=A0 =C2=A0 E =C2=A0 =C2=A0 =C2=A0O =C2=A0 =C2= =A0 =C2=A0P =C2=A0 =C2=A0 YGC =C2=A0 =C2=A0 YGCT =C2=A0 =C2=A0FGC =C2=A0 = =C2=A0FGCT =C2=A0 =C2=A0 GCT
=C2=A0 =C2=A0 =C2=A0 =C2=A0520731.6 =C2=A0 0.00 =C2=A012.70 =C2=A0= 36.37 =C2=A071.33 =C2=A059.26 =C2=A046453 1688.809 14785 2130.779 3819.588<= /font>
=C2=A0 =C2=A0 =C2=A0 =C2=A052073= 6.5 =C2=A0 0.00 =C2=A012.70 =C2=A053.25 =C2=A071.33 =C2=A059.26 =C2=A046453= 1688.809 14785 2130.779 3819.588
=C2=A0 =C2=A0 =C2=A0 =C2=A0520741.5 =C2=A0 0.00 =C2=A012.70 = =C2=A068.92 =C2=A071.33 =C2=A059.26 =C2=A046453 1688.809 14785 2130.779 381= 9.588
=C2=A0 =C2=A0 =C2=A0 =C2=A052074= 6.5 =C2=A0 0.00 =C2=A012.70 =C2=A083.11 =C2=A071.33 =C2=A059.26 =C2=A046453= 1688.809 14785 2130.779 3819.588


--20cf307cfef4b5226704cc593d80--