Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of btalbot@aeriagames.com
 designates 74.125.149.244 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAPmnz8EgZ37yFOZDnL0VY9E_AbJpqBkmGfRLc46PJYE+peNBaw@mail.gmail.com>
References: 
 <CAPmnz8EgZ37yFOZDnL0VY9E_AbJpqBkmGfRLc46PJYE+peNBaw@mail.gmail.com>
Date: Fri, 19 Oct 2012 10:59:02 -0700
Message-ID: 
 <CAPmnz8FaZbgw2M=Z2T10a5Bg3JBmT2wavT_xDyatzEPW2wTDQw@mail.gmail.com>
Subject: Re: constant CMS GC using CPU time
From: Bryan Talbot <btalbot@aeriagames.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=047d7b62461e4fc1e004cc6d4200

--047d7b62461e4fc1e004cc6d4200
Content-Type: text/plain; charset=UTF-8

ok, let me try asking the question a different way ...

How does cassandra use memory and how can I plan how much is needed?  I
have a 1 GB memtable and 5 GB total heap and that's still not enough even
though the number of concurrent connections and garbage generation rate is
fairly low.

If I were using mysql or oracle, I could compute how much memory could be
used by N concurrent connections, how much is allocated for caching, temp
spaces, etc.  How can I do this for cassandra?  Currently it seems like the
memory used scales with the amount of bytes stored and not with how busy
the server actually is.  That's not such a good thing.

-Bryan


On Thu, Oct 18, 2012 at 11:06 AM, Bryan Talbot <btalbot@aeriagames.com>wrote:

> In a 4 node cluster running Cassandra 1.1.5 with sun jvm 1.6.0_29-b11
> (64-bit), the nodes are often getting "stuck" in state where CMS
> collections of the old space are constantly running.
>
> The JVM configuration is using the standard settings in cassandra-env --
> relevant settings are included below.  The max heap is currently set to 5
> GB with 800MB for new size.  I don't believe that the cluster is overly
> busy and seems to be performing well enough other than this issue.  When
> nodes get into this state they never seem to leave it (by freeing up old
> space memory) without restarting cassandra.  They typically enter this
> state while running "nodetool repair -pr" but once they start doing this,
> restarting them only "fixes" it for a couple of hours.
>
> Compactions are completing and are generally not queued up.  All CF are
> using STCS.  The busiest CF consumes about 100GB of space on disk, is write
> heavy, and all columns have a TTL of 3 days.  Overall, there are 41 CF
> including those used for system keyspace and secondary indexes.  The number
> of SSTables per node currently varies from 185-212.
>
> Other than frequent log warnings about "GCInspector  - Heap is 0.xxx
> full..." and "StorageService  - Flushing CFS(...) to relieve memory
> pressure" there are no other log entries to indicate there is a problem.
>
> Does the memory needed vary depending on the amount of data stored?  If
> so, how can I predict how much jvm space is needed?  I don't want to make
> the heap too large as that's bad too.  Maybe there's a memory leak related
> to compaction that doesn't allow meta-data to be purged?
>
>
> -Bryan
>
>
> 12 GB of RAM in host with ~6 GB used by java and ~6 GB for OS and buffer
> cache.
> $> free -m
>              total       used       free     shared    buffers     cached
> Mem:         12001      11870        131          0          4       5778
> -/+ buffers/cache:       6087       5914
> Swap:            0          0          0
>
>
> jvm settings in cassandra-env
> MAX_HEAP_SIZE="5G"
> HEAP_NEWSIZE="800M"
>
> # GC tuning options
> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
> JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops"
>
>
> jstat shows about 12 full collections per minute with old heap usage
> constantly over 75% so CMS is always over the
> CMSInitiatingOccupancyFraction threshold.
>
> $> jstat -gcutil -t 22917 5000 4
> Timestamp         S0     S1     E      O      P     YGC     YGCT    FGC
>  FGCT     GCT
>        132063.0  34.70   0.00  26.03  82.29  59.88  21580  506.887 17523
> 3078.941 3585.829
>        132068.0  34.70   0.00  50.02  81.23  59.88  21580  506.887 17524
> 3079.220 3586.107
>        132073.1   0.00  24.92  46.87  81.41  59.88  21581  506.932 17525
> 3079.583 3586.515
>        132078.1   0.00  24.92  64.71  81.40  59.88  21581  506.932 17527
> 3079.853 3586.785
>
>
> Other hosts not currently experiencing the high CPU load have a heap less
> than .75 full.
>
> $> jstat -gcutil -t 6063 5000 4
> Timestamp         S0     S1     E      O      P     YGC     YGCT    FGC
>  FGCT     GCT
>        520731.6   0.00  12.70  36.37  71.33  59.26  46453 1688.809 14785
> 2130.779 3819.588
>        520736.5   0.00  12.70  53.25  71.33  59.26  46453 1688.809 14785
> 2130.779 3819.588
>        520741.5   0.00  12.70  68.92  71.33  59.26  46453 1688.809 14785
> 2130.779 3819.588
>        520746.5   0.00  12.70  83.11  71.33  59.26  46453 1688.809 14785
> 2130.779 3819.588
>
>
>
>

--047d7b62461e4fc1e004cc6d4200
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

ok, let me try asking the question a different way ...<div><br></div><div>H=
ow does cassandra use memory and how can I plan how much is needed? =C2=A0I=
 have a 1 GB memtable and 5 GB total heap and that&#39;s still not enough e=
ven though the number of concurrent connections and garbage generation rate=
 is fairly low.</div>
<div><br></div><div>If I were using mysql or oracle, I could compute how mu=
ch memory could be used by N concurrent connections, how much is allocated =
for caching, temp spaces, etc. =C2=A0How can I do this for cassandra? =C2=
=A0Currently it seems like the memory used scales with the amount of bytes =
stored and not with how busy the server actually is. =C2=A0That&#39;s not s=
uch a good thing.</div>
<div><br></div><div>-Bryan</div><div><br></div><div><br><br><div class=3D"g=
mail_quote">On Thu, Oct 18, 2012 at 11:06 AM, Bryan Talbot <span dir=3D"ltr=
">&lt;<a href=3D"mailto:btalbot@aeriagames.com" target=3D"_blank">btalbot@a=
eriagames.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div>In a 4 node cluster running Cassandra 1=
.1.5 with sun jvm=C2=A01.6.0_29-b11 (64-bit), the nodes are often getting &=
quot;stuck&quot; in state where CMS collections of the old space are consta=
ntly running. =C2=A0</div>
<div><br></div>
<div>The JVM configuration is using the standard settings in cassandra-env =
-- relevant settings are included below. =C2=A0The max heap is currently se=
t to 5 GB with 800MB for new size. =C2=A0I don&#39;t believe that the clust=
er is overly busy and seems to be performing well enough other than this is=
sue. =C2=A0When nodes get into this state they never seem to leave it (by f=
reeing up old space memory) without restarting cassandra. =C2=A0They=C2=A0t=
ypically enter this state while running &quot;nodetool repair -pr&quot; but=
 once they start doing this, restarting them only &quot;fixes&quot; it for =
a couple of hours.</div>

<div><br></div><div>Compactions are completing and are=C2=A0generally=C2=A0=
not queued up. =C2=A0All CF are using STCS. =C2=A0The busiest CF consumes a=
bout 100GB of space on disk, is write heavy, and all columns have a TTL of =
3 days. =C2=A0Overall, there are 41 CF including those used for system keys=
pace and secondary indexes. =C2=A0The number of SSTables per node currently=
 varies from 185-212.</div>

<div><br></div><div>Other than frequent log warnings about &quot;<font face=
=3D"courier new, monospace">GCInspector =C2=A0- Heap is 0.xxx full...</font=
>&quot; and &quot;<font face=3D"courier new, monospace">StorageService =C2=
=A0- Flushing CFS(...) to relieve memory pressure</font>&quot; there are no=
 other log entries to indicate there is a problem.</div>

<div><br></div><div>Does the memory needed vary depending on the amount of =
data stored? =C2=A0If so, how can I predict how much jvm space is needed? =
=C2=A0I don&#39;t want to make the heap too large as that&#39;s bad too. =
=C2=A0Maybe there&#39;s a memory leak related to compaction that doesn&#39;=
t allow meta-data to be purged?</div>

<div><br></div><div><br></div><div>-Bryan</div><div><br></div><div><br></di=
v><div>12 GB of RAM in host with ~6 GB used by java and ~6 GB for OS and bu=
ffer cache.</div><div><font face=3D"courier new, monospace">$&gt;=C2=A0free=
 -m</font></div>

<div><font face=3D"courier new, monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0total =C2=A0 =C2=A0 =C2=A0 used =C2=A0 =C2=A0 =C2=A0 free =
=C2=A0 =C2=A0 shared =C2=A0 =C2=A0buffers =C2=A0 =C2=A0 cached</font></div>=
<div><font face=3D"courier new, monospace">Mem: =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 12001 =C2=A0 =C2=A0 =C2=A011870 =C2=A0 =C2=A0 =C2=A0 =C2=A0131 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A04 =C2=A0 =C2=A0=
 =C2=A0 5778</font></div>

<div><font face=3D"courier new, monospace">-/+ buffers/cache: =C2=A0 =C2=A0=
 =C2=A0 6087 =C2=A0 =C2=A0 =C2=A0 5914</font></div><div><font face=3D"couri=
er new, monospace">Swap: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00</font></di=
v><div><font face=3D"courier new, monospace"><br>

</font></div><div><font face=3D"courier new, monospace"><br></font></div>jv=
m settings in cassandra-env<div><div><font face=3D"courier new, monospace">=
MAX_HEAP_SIZE=3D&quot;5G&quot;</font></div><div><font face=3D"courier new, =
monospace">HEAP_NEWSIZE=3D&quot;800M&quot;</font></div>

</div><div><font face=3D"courier new, monospace"><br></font></div><div><div=
><font face=3D"courier new, monospace"># GC tuning options</font></div><div=
><font face=3D"courier new, monospace">JVM_OPTS=3D&quot;$JVM_OPTS -XX:+UseP=
arNewGC&quot;=C2=A0</font></div>

<div><font face=3D"courier new, monospace">JVM_OPTS=3D&quot;$JVM_OPTS -XX:+=
UseConcMarkSweepGC&quot;=C2=A0</font></div><div><font face=3D"courier new, =
monospace">JVM_OPTS=3D&quot;$JVM_OPTS -XX:+CMSParallelRemarkEnabled&quot;=
=C2=A0</font></div>

<div><font face=3D"courier new, monospace">JVM_OPTS=3D&quot;$JVM_OPTS -XX:S=
urvivorRatio=3D8&quot;=C2=A0</font></div><div><font face=3D"courier new, mo=
nospace">JVM_OPTS=3D&quot;$JVM_OPTS -XX:MaxTenuringThreshold=3D1&quot;</fon=
t></div><div>

<font face=3D"courier new, monospace">JVM_OPTS=3D&quot;$JVM_OPTS -XX:CMSIni=
tiatingOccupancyFraction=3D75&quot;</font></div><div><font face=3D"courier =
new, monospace">JVM_OPTS=3D&quot;$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOn=
ly&quot;</font></div>

<div><font face=3D"courier new, monospace">JVM_OPTS=3D&quot;$JVM_OPTS -XX:+=
UseCompressedOops&quot;</font></div></div><div><font face=3D"courier new, m=
onospace"><br></font></div><div><font face=3D"courier new, monospace"><br><=
/font></div>

jstat shows about 12 full collections per minute with old heap usage consta=
ntly over 75% so CMS is always over the CMSInitiatingOccupancyFraction thre=
shold.<div><font face=3D"courier new, monospace"><div><br></div><div>$&gt; =
jstat -gcutil -t 22917 5000 4</div>

<div>Timestamp =C2=A0 =C2=A0 =C2=A0 =C2=A0 S0 =C2=A0 =C2=A0 S1 =C2=A0 =C2=
=A0 E =C2=A0 =C2=A0 =C2=A0O =C2=A0 =C2=A0 =C2=A0P =C2=A0 =C2=A0 YGC =C2=A0 =
=C2=A0 YGCT =C2=A0 =C2=A0FGC =C2=A0 =C2=A0FGCT =C2=A0 =C2=A0 GCT =C2=A0=C2=
=A0</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0132063.0 =C2=A034.70 =C2=A0 0.00 =
=C2=A026.03 =C2=A082.29 =C2=A059.88 =C2=A021580 =C2=A0506.887 17523 3078.94=
1 3585.829</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0132068.0 =C2=A034.70 =C2=A0=
 0.00 =C2=A050.02 =C2=A081.23 =C2=A059.88 =C2=A021580 =C2=A0506.887 17524 3=
079.220 3586.107</div>

<div>=C2=A0 =C2=A0 =C2=A0 =C2=A0132073.1 =C2=A0 0.00 =C2=A024.92 =C2=A046.8=
7 =C2=A081.41 =C2=A059.88 =C2=A021581 =C2=A0506.932 17525 3079.583 3586.515=
</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0132078.1 =C2=A0 0.00 =C2=A024.92 =C2=
=A064.71 =C2=A081.40 =C2=A059.88 =C2=A021581 =C2=A0506.932 17527 3079.853 3=
586.785</div><div><br></div></font></div>

<div><br></div><div>Other hosts not currently experiencing the high CPU loa=
d have a heap less than .75 full.</div><div><br></div><div><div><font face=
=3D"courier new, monospace">$&gt; jstat -gcutil -t 6063 5000 4</font></div>

<div><font face=3D"courier new, monospace">Timestamp =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 S0 =C2=A0 =C2=A0 S1 =C2=A0 =C2=A0 E =C2=A0 =C2=A0 =C2=A0O =C2=A0 =C2=
=A0 =C2=A0P =C2=A0 =C2=A0 YGC =C2=A0 =C2=A0 YGCT =C2=A0 =C2=A0FGC =C2=A0 =
=C2=A0FGCT =C2=A0 =C2=A0 GCT</font></div><div><font face=3D"courier new, mo=
nospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0520731.6 =C2=A0 0.00 =C2=A012.70 =C2=A0=
36.37 =C2=A071.33 =C2=A059.26 =C2=A046453 1688.809 14785 2130.779 3819.588<=
/font></div>

<div><font face=3D"courier new, monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A052073=
6.5 =C2=A0 0.00 =C2=A012.70 =C2=A053.25 =C2=A071.33 =C2=A059.26 =C2=A046453=
 1688.809 14785 2130.779 3819.588</font></div><div><font face=3D"courier ne=
w, monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0520741.5 =C2=A0 0.00 =C2=A012.70 =
=C2=A068.92 =C2=A071.33 =C2=A059.26 =C2=A046453 1688.809 14785 2130.779 381=
9.588</font></div>

<div><font face=3D"courier new, monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A052074=
6.5 =C2=A0 0.00 =C2=A012.70 =C2=A083.11 =C2=A071.33 =C2=A059.26 =C2=A046453=
 1688.809 14785 2130.779 3819.588</font></div></div><div><br></div><div><br=
></div><div><br></div>
</blockquote></div><br><br>
</div>

--047d7b62461e4fc1e004cc6d4200--