Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of yungmwong@gmail.com designates
 209.85.192.44 as permitted sender)
MIME-Version: 1.0
Date: Thu, 18 Dec 2014 16:58:16 -0500
Message-ID: 
 <CAOmi8Xf-6dibDMZ9yA7jbJ+QzgexoPGEw5NACo5iBANjdxHwCg@mail.gmail.com>
Subject: Re: full gc too oftenvAquin p y l mmm am m
From: "Y.Wong" <yungmwong@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001a11c2c9a4840f93050a84b079

--001a11c2c9a4840f93050a84b079
Content-Type: text/plain; charset=UTF-8

V
On Dec 4, 2014 11:14 PM, "Philo Yang" <ud1937@gmail.com> wrote:

> Hi,all
>
> I have a cluster on C* 2.1.1 and jdk 1.7_u51. I have a trouble with full
> gc that sometime there may be one or two nodes full gc more than one time
> per minute and over 10 seconds each time, then the node will be unreachable
> and the latency of cluster will be increased.
>
> I grep the GCInspector's log, I found when the node is running fine
> without gc trouble there are two kinds of gc:
> ParNew GC in less than 300ms which clear the Par Eden Space and
> enlarge CMS Old Gen/ Par Survivor Space little (because it only show gc in
> more than 200ms, there is only a small number of ParNew GC in log)
> ConcurrentMarkSweep in 4000~8000ms which reduce CMS Old Gen much and
> enlarge Par Eden Space little, each 1-2 hours it will be executed once.
>
> However, sometimes ConcurrentMarkSweep will be strange like it shows:
>
> INFO  [Service Thread] 2014-12-05 11:28:44,629 GCInspector.java:142 -
> ConcurrentMarkSweep GC in 12648ms.  CMS Old Gen: 3579838424 -> 3579838464;
> Par Eden Space: 503316480 -> 294794576; Par Survivor Space: 62914528 -> 0
> INFO  [Service Thread] 2014-12-05 11:28:59,581 GCInspector.java:142 -
> ConcurrentMarkSweep GC in 12227ms.  CMS Old Gen: 3579838464 -> 3579836512;
> Par Eden Space: 503316480 -> 310562032; Par Survivor Space: 62872496 -> 0
> INFO  [Service Thread] 2014-12-05 11:29:14,686 GCInspector.java:142 -
> ConcurrentMarkSweep GC in 11538ms.  CMS Old Gen: 3579836688 -> 3579805792;
> Par Eden Space: 503316480 -> 332391096; Par Survivor Space: 62914544 -> 0
> INFO  [Service Thread] 2014-12-05 11:29:29,371 GCInspector.java:142 -
> ConcurrentMarkSweep GC in 12180ms.  CMS Old Gen: 3579835784 -> 3579829760;
> Par Eden Space: 503316480 -> 351991456; Par Survivor Space: 62914552 -> 0
> INFO  [Service Thread] 2014-12-05 11:29:45,028 GCInspector.java:142 -
> ConcurrentMarkSweep GC in 10574ms.  CMS Old Gen: 3579838112 -> 3579799752;
> Par Eden Space: 503316480 -> 366222584; Par Survivor Space: 62914560 -> 0
> INFO  [Service Thread] 2014-12-05 11:29:59,546 GCInspector.java:142 -
> ConcurrentMarkSweep GC in 11594ms.  CMS Old Gen: 3579831424 -> 3579817392;
> Par Eden Space: 503316480 -> 388702928; Par Survivor Space: 62914552 -> 0
> INFO  [Service Thread] 2014-12-05 11:30:14,153 GCInspector.java:142 -
> ConcurrentMarkSweep GC in 11463ms.  CMS Old Gen: 3579817392 -> 3579838424;
> Par Eden Space: 503316480 -> 408992784; Par Survivor Space: 62896720 -> 0
> INFO  [Service Thread] 2014-12-05 11:30:25,009 GCInspector.java:142 -
> ConcurrentMarkSweep GC in 9576ms.  CMS Old Gen: 3579838424 -> 3579816424;
> Par Eden Space: 503316480 -> 438633608; Par Survivor Space: 62914544 -> 0
> INFO  [Service Thread] 2014-12-05 11:30:39,929 GCInspector.java:142 -
> ConcurrentMarkSweep GC in 11556ms.  CMS Old Gen: 3579816424 -> 3579785496;
> Par Eden Space: 503316480 -> 441354856; Par Survivor Space: 62889528 -> 0
> INFO  [Service Thread] 2014-12-05 11:30:54,085 GCInspector.java:142 -
> ConcurrentMarkSweep GC in 12082ms.  CMS Old Gen: 3579786592 -> 3579814464;
> Par Eden Space: 503316480 -> 448782440; Par Survivor Space: 62914560 -> 0
>
> In each time Old Gen reduce only a little, Survivor Space will be clear
> but the heap is still full so there will be another full gc very soon then
> the node will down. If I restart the node, it will be fine without gc
> trouble.
>
> Can anyone help me to find out where is the problem that full gc can't
> reduce CMS Old Gen? Is it because there are too many objects in heap can't
> be recycled? I think review the table scheme designing and add new nodes
> into cluster is a good idea, but I still want to know if there is any other
> reason causing this trouble.
>
> Thanks,
> Philo Yang
>
>

--001a11c2c9a4840f93050a84b079
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<p dir=3D"ltr">V</p>
<div class=3D"gmail_quote">On Dec 4, 2014 11:14 PM, &quot;Philo Yang&quot; =
&lt;<a href=3D"mailto:ud1937@gmail.com">ud1937@gmail.com</a>&gt; wrote:<br =
type=3D"attribution"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 =
0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>H=
i,all</div><div><br></div><div>I have a cluster on C* 2.1.1 and jdk 1.7_u51=
. I have a trouble with full gc that sometime there may be one or two nodes=
 full gc more than one time per minute and over 10 seconds each time, then =
the node will be unreachable and the=C2=A0latency of cluster will be=C2=A0<=
span style=3D"color:rgb(0,0,0);font-family:&#39;Microsoft YaHei&#39;;font-s=
ize:13px;white-space:nowrap">increased.</span></div><div><div><br></div><di=
v>I grep the GCInspector&#39;s log, I found when the node is running fine w=
ithout gc trouble there are two kinds of gc:=C2=A0</div><div>ParNew GC in l=
ess than 300ms which clear the=C2=A0Par Eden Space and enlarge=C2=A0CMS Old=
 Gen/=C2=A0Par Survivor Space little (because it only show gc in more than =
200ms, there is only a small number of ParNew GC in log)</div><div>Concurre=
ntMarkSweep in 4000~8000ms which reduce=C2=A0CMS Old Gen much and enlarge P=
ar Eden Space little, each 1-2 hours it will be executed once.</div><div><b=
r></div><div>However, sometimes ConcurrentMarkSweep will be strange like it=
 shows:</div><div><br></div><div><div>INFO =C2=A0[Service Thread] 2014-12-0=
5 11:28:44,629 GCInspector.java:142 - ConcurrentMarkSweep GC in 12648ms.=C2=
=A0 CMS Old Gen: 3579838424 -&gt; 3579838464; Par Eden Space: 503316480 -&g=
t; 294794576; Par Survivor Space: 62914528 -&gt; 0</div><div>INFO =C2=A0[Se=
rvice Thread] 2014-12-05 11:28:59,581 GCInspector.java:142 - ConcurrentMark=
Sweep GC in 12227ms.=C2=A0 CMS Old Gen: 3579838464 -&gt; 3579836512; Par Ed=
en Space: 503316480 -&gt; 310562032; Par Survivor Space: 62872496 -&gt; 0</=
div><div>INFO =C2=A0[Service Thread] 2014-12-05 11:29:14,686 GCInspector.ja=
va:142 - ConcurrentMarkSweep GC in 11538ms.=C2=A0 CMS Old Gen: 3579836688 -=
&gt; 3579805792; Par Eden Space: 503316480 -&gt; 332391096; Par Survivor Sp=
ace: 62914544 -&gt; 0</div><div>INFO =C2=A0[Service Thread] 2014-12-05 11:2=
9:29,371 GCInspector.java:142 - ConcurrentMarkSweep GC in 12180ms.=C2=A0 CM=
S Old Gen: 3579835784 -&gt; 3579829760; Par Eden Space: 503316480 -&gt; 351=
991456; Par Survivor Space: 62914552 -&gt; 0</div><div>INFO =C2=A0[Service =
Thread] 2014-12-05 11:29:45,028 GCInspector.java:142 - ConcurrentMarkSweep =
GC in 10574ms.=C2=A0 CMS Old Gen: 3579838112 -&gt; 3579799752; Par Eden Spa=
ce: 503316480 -&gt; 366222584; Par Survivor Space: 62914560 -&gt; 0</div><d=
iv>INFO =C2=A0[Service Thread] 2014-12-05 11:29:59,546 GCInspector.java:142=
 - ConcurrentMarkSweep GC in 11594ms.=C2=A0 CMS Old Gen: 3579831424 -&gt; 3=
579817392; Par Eden Space: 503316480 -&gt; 388702928; Par Survivor Space: 6=
2914552 -&gt; 0</div><div>INFO =C2=A0[Service Thread] 2014-12-05 11:30:14,1=
53 GCInspector.java:142 - ConcurrentMarkSweep GC in 11463ms.=C2=A0 CMS Old =
Gen: 3579817392 -&gt; 3579838424; Par Eden Space: 503316480 -&gt; 408992784=
; Par Survivor Space: 62896720 -&gt; 0</div><div>INFO =C2=A0[Service Thread=
] 2014-12-05 11:30:25,009 GCInspector.java:142 - ConcurrentMarkSweep GC in =
9576ms.=C2=A0 CMS Old Gen: 3579838424 -&gt; 3579816424; Par Eden Space: 503=
316480 -&gt; 438633608; Par Survivor Space: 62914544 -&gt; 0</div><div>INFO=
 =C2=A0[Service Thread] 2014-12-05 11:30:39,929 GCInspector.java:142 - Conc=
urrentMarkSweep GC in 11556ms.=C2=A0 CMS Old Gen: 3579816424 -&gt; 35797854=
96; Par Eden Space: 503316480 -&gt; 441354856; Par Survivor Space: 62889528=
 -&gt; 0</div><div>INFO =C2=A0[Service Thread] 2014-12-05 11:30:54,085 GCIn=
spector.java:142 - ConcurrentMarkSweep GC in 12082ms.=C2=A0 CMS Old Gen: 35=
79786592 -&gt; 3579814464; Par Eden Space: 503316480 -&gt; 448782440; Par S=
urvivor Space: 62914560 -&gt; 0</div><div><br></div></div></div><div>In eac=
h time Old Gen reduce only a little, Survivor Space will be clear but the h=
eap is still full so there will be another full gc very soon then the node =
will down. If I restart the node, it will be fine without gc trouble.=C2=A0=
</div><div><br></div><div>Can anyone help me to find out where is the probl=
em that full gc can&#39;t reduce CMS Old Gen? Is it because there are too m=
any objects in heap can&#39;t be recycled? I think review the table scheme =
designing and add new nodes into cluster is a good idea, but I still want t=
o know if there is any other reason causing this trouble.</div><br clear=3D=
"all"><div><div><div dir=3D"ltr"><div><span style=3D"font-family:arial,sans=
-serif;background-color:rgb(255,255,255)">Thanks,</span>
</div><div>Philo Yang</div><div><br></div></div></div></div>
</div>
</blockquote></div>

--001a11c2c9a4840f93050a84b079--