Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of ares.tang@gmail.com designates
 209.85.223.177 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <52705226.9070408@gmail.com>
References: <52705226.9070408@gmail.com>
Date: Wed, 30 Oct 2013 17:34:54 +0800
Message-ID: 
 <CAFb+LUx_JAb6GVyFm+o-e=_FmOrXMmvBNGdM4MPQjEiwnXrhXg@mail.gmail.com>
Subject: Re: heap issues - looking for advices on gc tuning
From: Jason Tang <ares.tang@gmail.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Content-Type: multipart/alternative; boundary=089e01229f86bb4d9e04e9f20b5d

--089e01229f86bb4d9e04e9f20b5d
Content-Type: text/plain; charset=ISO-8859-1

What's configuration of following parameters
memtable_flush_queue_size:
concurrent_compactors:


2013/10/30 Piavlo <lolitushka@gmail.com>

> Hi,
>
> Below I try to give a full picture to the problem I'm facing.
>
> This is a 12 node cluster, running on ec2 with m2.xlarge instances (17G
> ram , 2 cpus).
> Cassandra version is 1.0.8
> Cluster normally having between 3000 - 1500 reads per second (depends on
> time of the day) and 1700 - 800 writes per second- according to Opscetner.
> RF=3, now row caches are used.
>
> Memory relevant  configs from cassandra.yaml:
> flush_largest_memtables_at: 0.85
> reduce_cache_sizes_at: 0.90
> reduce_cache_capacity_to: 0.75
> commitlog_total_space_in_mb: 4096
>
> relevant JVM options used are:
> -Xms8000M -Xmx8000M -Xmn400M
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
> -XX:MaxTenuringThreshold=1
> -XX:**CMSInitiatingOccupancyFraction**=80 -XX:+**
> UseCMSInitiatingOccupancyOnly"
>
> Now what happens is that with these settings after cassandra process
> restart, the GC it working fine at the beginning, and heap used looks like a
> saw with perfect teeth, eventually the teeth size start to diminish until
> the teeth become not noticable, and then cassandra starts to spend lot's of
> CPU time
> doing gc. It takes about 2 weeks until for such cycle , and then I need to
> restart cassandra process to improve performance.
> During all this time there are no memory  related messages in cassandra
> system.log, except a "GC for ParNew: little above 200ms" once in a while.
>
> Things i've already done trying to reduce this eventual heap pressure.
> 1) reducing bloom_filter_fp_chance  resulting in reduction from ~700MB to
> ~280MB total per node based on all Filter.db files on the node.
> 2) reducing key cache sizes, and dropping key_caches for CFs which do no
> not have many reads
> 3) the heap size was increased from 7000M to 8000M
> All these have not really helped , just the increase from 7000M to 8000M,
> helped in increase the cycle till excessive gc from ~9 days to ~14 days.
>
> I've tried to graph overtime the data that is supposed to be in heap vs
> actual heap size, by summing up all CFs bloom filter sizes + all CFs key
> cache capacities multipled by average key size + all CFs memtables data
> size reported (i've overestimated the data size a bit on purpose to be on
> the safe size).
> Here is a link to graph showing last 2 day metrics for a node which could
> not effectively do GC, and then cassandra process was restarted.
> http://awesomescreenshot.com/**0401w5y534<http://awesomescreenshot.com/0401w5y534>
> You can clearly see that before and after restart, the size of data that
> is in supposed to be in heap, is the same pretty much the same,
> which makes me think that I really need is GC tunning.
>
> Also I suppose that this is not due to number of total keys each node has
> , which is between 300 - 200 milions keys for all CF key estimates summed
> on a code.
> The nodes have datasize between 75G to 45G  accordingly to milions of
> keys. And all nodes are starting to have having GC heavy load after about
> 14 days.
> Also the excessive GC and heap usage are not affected by load which varies
> depending on time of the day (see read/write rates at the beginning of the
> mail).
> So again based on this , I assume this is not due to large number of keys
> or too much load on the cluster,  but due to a pure GC misconfiguration
> issue.
>
> Things I remember that I've tried for GC tunning:
> 1) Changing -XX:MaxTenuringThreshold=1 to values like 8 - did not help.
> 2) Adding  -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:**
> CMSIncrementalDutyCycleMin=0
>                   -XX:CMSIncrementalDutyCycle=10 -XX:ParallelGCThreads=2
> JVM_OPTS -XX:ParallelCMSThreads=1
>     this actually made things worse.
> 3) Adding -XX:-XX-UseAdaptiveSizePolicy -XX:SurvivorRatio=8 - did not help.
>
> Also since it takes like 2 weeks to verify that changing GC setting did
> not help, the process is painfully slow to try all the possibilities :)
> I'd highly appreciate any help and hints on the GC tunning.
>
> tnx
> Alex
>
>
>
>
>
>
>

--089e01229f86bb4d9e04e9f20b5d
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>What&#39;s configuration of following parameters</div=
>memtable_flush_queue_size:<br><div>concurrent_compactors:=A0<br></div></di=
v><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">2013/10/30 =
Piavlo <span dir=3D"ltr">&lt;<a href=3D"mailto:lolitushka@gmail.com" target=
=3D"_blank">lolitushka@gmail.com</a>&gt;</span><br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Hi,<br>
<br>
Below I try to give a full picture to the problem I&#39;m facing.<br>
<br>
This is a 12 node cluster, running on ec2 with m2.xlarge instances (17G ram=
 , 2 cpus).<br>
Cassandra version is 1.0.8<br>
Cluster normally having between 3000 - 1500 reads per second (depends on ti=
me of the day) and 1700 - 800 writes per second- according to Opscetner.<br=
>
RF=3D3, now row caches are used.<br>
<br>
Memory relevant =A0configs from cassandra.yaml:<br>
flush_largest_memtables_at: 0.85<br>
reduce_cache_sizes_at: 0.90<br>
reduce_cache_capacity_to: 0.75<br>
commitlog_total_space_in_mb: 4096<br>
<br>
relevant JVM options used are:<br>
-Xms8000M -Xmx8000M -Xmn400M<br>
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:=
MaxTenuringThreshold=3D1<br>
-XX:<u></u>CMSInitiatingOccupancyFraction<u></u>=3D80 -XX:+<u></u>UseCMSIni=
tiatingOccupancyOnly&quot;<br>
<br>
Now what happens is that with these settings after cassandra process restar=
t, the GC it working fine at the beginning, and heap used looks like a<br>
saw with perfect teeth, eventually the teeth size start to diminish until t=
he teeth become not noticable, and then cassandra starts to spend lot&#39;s=
 of CPU time<br>
doing gc. It takes about 2 weeks until for such cycle , and then I need to =
restart cassandra process to improve performance.<br>
During all this time there are no memory =A0related messages in cassandra s=
ystem.log, except a &quot;GC for ParNew: little above 200ms&quot; once in a=
 while.<br>
<br>
Things i&#39;ve already done trying to reduce this eventual heap pressure.<=
br>
1) reducing bloom_filter_fp_chance =A0resulting in reduction from ~700MB to=
 ~280MB total per node based on all Filter.db files on the node.<br>
2) reducing key cache sizes, and dropping key_caches for CFs which do no no=
t have many reads<br>
3) the heap size was increased from 7000M to 8000M<br>
All these have not really helped , just the increase from 7000M to 8000M, h=
elped in increase the cycle till excessive gc from ~9 days to ~14 days.<br>
<br>
I&#39;ve tried to graph overtime the data that is supposed to be in heap vs=
 actual heap size, by summing up all CFs bloom filter sizes + all CFs key c=
ache capacities multipled by average key size + all CFs memtables data size=
 reported (i&#39;ve overestimated the data size a bit on purpose to be on t=
he safe size).<br>

Here is a link to graph showing last 2 day metrics for a node which could n=
ot effectively do GC, and then cassandra process was restarted.<br>
<a href=3D"http://awesomescreenshot.com/0401w5y534" target=3D"_blank">http:=
//awesomescreenshot.com/<u></u>0401w5y534</a><br>
You can clearly see that before and after restart, the size of data that is=
 in supposed to be in heap, is the same pretty much the same,<br>
which makes me think that I really need is GC tunning.<br>
<br>
Also I suppose that this is not due to number of total keys each node has ,=
 which is between 300 - 200 milions keys for all CF key estimates summed on=
 a code.<br>
The nodes have datasize between 75G to 45G =A0accordingly to milions of key=
s. And all nodes are starting to have having GC heavy load after about 14 d=
ays.<br>
Also the excessive GC and heap usage are not affected by load which varies =
depending on time of the day (see read/write rates at the beginning of the =
mail).<br>
So again based on this , I assume this is not due to large number of keys o=
r too much load on the cluster, =A0but due to a pure GC misconfiguration is=
sue.<br>
<br>
Things I remember that I&#39;ve tried for GC tunning:<br>
1) Changing -XX:MaxTenuringThreshold=3D1 to values like 8 - did not help.<b=
r>
2) Adding =A0-XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:<u></u>C=
MSIncrementalDutyCycleMin=3D0<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 -XX:CMSIncrementalDutyCycle=3D10 -XX:Pa=
rallelGCThreads=3D2 JVM_OPTS -XX:ParallelCMSThreads=3D1<br>
=A0 =A0 this actually made things worse.<br>
3) Adding -XX:-XX-UseAdaptiveSizePolicy -XX:SurvivorRatio=3D8 - did not hel=
p.<br>
<br>
Also since it takes like 2 weeks to verify that changing GC setting did not=
 help, the process is painfully slow to try all the possibilities :)<br>
I&#39;d highly appreciate any help and hints on the GC tunning.<br>
<br>
tnx<br>
Alex<br>
<br>
<br>
<br>
<br>
<br>
<br>
</blockquote></div><br></div>

--089e01229f86bb4d9e04e9f20b5d--