Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CEF1B1074F for ; Thu, 18 Dec 2014 22:01:05 +0000 (UTC) Received: (qmail 45503 invoked by uid 500); 18 Dec 2014 22:01:00 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 45463 invoked by uid 500); 18 Dec 2014 22:01:00 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 45453 invoked by uid 99); 18 Dec 2014 22:01:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Dec 2014 22:01:00 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of yungmwong@gmail.com designates 209.85.192.44 as permitted sender) Received: from [209.85.192.44] (HELO mail-qg0-f44.google.com) (209.85.192.44) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Dec 2014 22:00:33 +0000 Received: by mail-qg0-f44.google.com with SMTP id q107so1568673qgd.31 for ; Thu, 18 Dec 2014 13:58:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=n0UCosVqdzeEozi4emFFXyW78preLK9rbxTmucuDvuU=; b=pJR/6m4b76CMGZfG6KfYfZVUgR5w8kdqtgIeKEtGyGJAXNNaVr2uM4kDCEoWxQgu1H AHAYfsh13U2L2S+iJ4S4jBctbTEbBJTmJI6CxDAFXzD1CKBNew0Ks1HHZip45CXyfg+l 7q+Sa/8zPOPoBTX6iCXSMGavsbFCG+Udldp8ia2vEk3ozXuLxg83VfPmvi8XD6NDuKlI sFhZzep/XxQ/R/xFS9VzlIvid/wWE6WGNnh6PqDka8PMv/m+NSadxN3mmstgEIuilaWZ wL5ptONBktnj2Lyxdf30nCKeb328jCRFVfSHyLYRE4dR/J6I60ZXaD8KkStcghek+ZOD inJw== MIME-Version: 1.0 X-Received: by 10.224.65.134 with SMTP id j6mr8073707qai.90.1418939896949; Thu, 18 Dec 2014 13:58:16 -0800 (PST) Received: by 10.229.164.205 with HTTP; Thu, 18 Dec 2014 13:58:16 -0800 (PST) Received: by 10.229.164.205 with HTTP; Thu, 18 Dec 2014 13:58:16 -0800 (PST) Date: Thu, 18 Dec 2014 16:58:16 -0500 Message-ID: Subject: Re: full gc too oftenvAquin p y l mmm am m From: "Y.Wong" To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a11c2c9a4840f93050a84b079 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c2c9a4840f93050a84b079 Content-Type: text/plain; charset=UTF-8 V On Dec 4, 2014 11:14 PM, "Philo Yang" wrote: > Hi,all > > I have a cluster on C* 2.1.1 and jdk 1.7_u51. I have a trouble with full > gc that sometime there may be one or two nodes full gc more than one time > per minute and over 10 seconds each time, then the node will be unreachable > and the latency of cluster will be increased. > > I grep the GCInspector's log, I found when the node is running fine > without gc trouble there are two kinds of gc: > ParNew GC in less than 300ms which clear the Par Eden Space and > enlarge CMS Old Gen/ Par Survivor Space little (because it only show gc in > more than 200ms, there is only a small number of ParNew GC in log) > ConcurrentMarkSweep in 4000~8000ms which reduce CMS Old Gen much and > enlarge Par Eden Space little, each 1-2 hours it will be executed once. > > However, sometimes ConcurrentMarkSweep will be strange like it shows: > > INFO [Service Thread] 2014-12-05 11:28:44,629 GCInspector.java:142 - > ConcurrentMarkSweep GC in 12648ms. CMS Old Gen: 3579838424 -> 3579838464; > Par Eden Space: 503316480 -> 294794576; Par Survivor Space: 62914528 -> 0 > INFO [Service Thread] 2014-12-05 11:28:59,581 GCInspector.java:142 - > ConcurrentMarkSweep GC in 12227ms. CMS Old Gen: 3579838464 -> 3579836512; > Par Eden Space: 503316480 -> 310562032; Par Survivor Space: 62872496 -> 0 > INFO [Service Thread] 2014-12-05 11:29:14,686 GCInspector.java:142 - > ConcurrentMarkSweep GC in 11538ms. CMS Old Gen: 3579836688 -> 3579805792; > Par Eden Space: 503316480 -> 332391096; Par Survivor Space: 62914544 -> 0 > INFO [Service Thread] 2014-12-05 11:29:29,371 GCInspector.java:142 - > ConcurrentMarkSweep GC in 12180ms. CMS Old Gen: 3579835784 -> 3579829760; > Par Eden Space: 503316480 -> 351991456; Par Survivor Space: 62914552 -> 0 > INFO [Service Thread] 2014-12-05 11:29:45,028 GCInspector.java:142 - > ConcurrentMarkSweep GC in 10574ms. CMS Old Gen: 3579838112 -> 3579799752; > Par Eden Space: 503316480 -> 366222584; Par Survivor Space: 62914560 -> 0 > INFO [Service Thread] 2014-12-05 11:29:59,546 GCInspector.java:142 - > ConcurrentMarkSweep GC in 11594ms. CMS Old Gen: 3579831424 -> 3579817392; > Par Eden Space: 503316480 -> 388702928; Par Survivor Space: 62914552 -> 0 > INFO [Service Thread] 2014-12-05 11:30:14,153 GCInspector.java:142 - > ConcurrentMarkSweep GC in 11463ms. CMS Old Gen: 3579817392 -> 3579838424; > Par Eden Space: 503316480 -> 408992784; Par Survivor Space: 62896720 -> 0 > INFO [Service Thread] 2014-12-05 11:30:25,009 GCInspector.java:142 - > ConcurrentMarkSweep GC in 9576ms. CMS Old Gen: 3579838424 -> 3579816424; > Par Eden Space: 503316480 -> 438633608; Par Survivor Space: 62914544 -> 0 > INFO [Service Thread] 2014-12-05 11:30:39,929 GCInspector.java:142 - > ConcurrentMarkSweep GC in 11556ms. CMS Old Gen: 3579816424 -> 3579785496; > Par Eden Space: 503316480 -> 441354856; Par Survivor Space: 62889528 -> 0 > INFO [Service Thread] 2014-12-05 11:30:54,085 GCInspector.java:142 - > ConcurrentMarkSweep GC in 12082ms. CMS Old Gen: 3579786592 -> 3579814464; > Par Eden Space: 503316480 -> 448782440; Par Survivor Space: 62914560 -> 0 > > In each time Old Gen reduce only a little, Survivor Space will be clear > but the heap is still full so there will be another full gc very soon then > the node will down. If I restart the node, it will be fine without gc > trouble. > > Can anyone help me to find out where is the problem that full gc can't > reduce CMS Old Gen? Is it because there are too many objects in heap can't > be recycled? I think review the table scheme designing and add new nodes > into cluster is a good idea, but I still want to know if there is any other > reason causing this trouble. > > Thanks, > Philo Yang > > --001a11c2c9a4840f93050a84b079 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

V

On Dec 4, 2014 11:14 PM, "Philo Yang" = <ud1937@gmail.com> wrote:
H= i,all

I have a cluster on C* 2.1.1 and jdk 1.7_u51= . I have a trouble with full gc that sometime there may be one or two nodes= full gc more than one time per minute and over 10 seconds each time, then = the node will be unreachable and the=C2=A0latency of cluster will be=C2=A0<= span style=3D"color:rgb(0,0,0);font-family:'Microsoft YaHei';font-s= ize:13px;white-space:nowrap">increased.

I grep the GCInspector's log, I found when the node is running fine w= ithout gc trouble there are two kinds of gc:=C2=A0
ParNew GC in l= ess than 300ms which clear the=C2=A0Par Eden Space and enlarge=C2=A0CMS Old= Gen/=C2=A0Par Survivor Space little (because it only show gc in more than = 200ms, there is only a small number of ParNew GC in log)
Concurre= ntMarkSweep in 4000~8000ms which reduce=C2=A0CMS Old Gen much and enlarge P= ar Eden Space little, each 1-2 hours it will be executed once.
However, sometimes ConcurrentMarkSweep will be strange like it= shows:

INFO =C2=A0[Service Thread] 2014-12-0= 5 11:28:44,629 GCInspector.java:142 - ConcurrentMarkSweep GC in 12648ms.=C2= =A0 CMS Old Gen: 3579838424 -> 3579838464; Par Eden Space: 503316480 -&g= t; 294794576; Par Survivor Space: 62914528 -> 0
INFO =C2=A0[Se= rvice Thread] 2014-12-05 11:28:59,581 GCInspector.java:142 - ConcurrentMark= Sweep GC in 12227ms.=C2=A0 CMS Old Gen: 3579838464 -> 3579836512; Par Ed= en Space: 503316480 -> 310562032; Par Survivor Space: 62872496 -> 0
INFO =C2=A0[Service Thread] 2014-12-05 11:29:14,686 GCInspector.ja= va:142 - ConcurrentMarkSweep GC in 11538ms.=C2=A0 CMS Old Gen: 3579836688 -= > 3579805792; Par Eden Space: 503316480 -> 332391096; Par Survivor Sp= ace: 62914544 -> 0
INFO =C2=A0[Service Thread] 2014-12-05 11:2= 9:29,371 GCInspector.java:142 - ConcurrentMarkSweep GC in 12180ms.=C2=A0 CM= S Old Gen: 3579835784 -> 3579829760; Par Eden Space: 503316480 -> 351= 991456; Par Survivor Space: 62914552 -> 0
INFO =C2=A0[Service = Thread] 2014-12-05 11:29:45,028 GCInspector.java:142 - ConcurrentMarkSweep = GC in 10574ms.=C2=A0 CMS Old Gen: 3579838112 -> 3579799752; Par Eden Spa= ce: 503316480 -> 366222584; Par Survivor Space: 62914560 -> 0
INFO =C2=A0[Service Thread] 2014-12-05 11:29:59,546 GCInspector.java:142= - ConcurrentMarkSweep GC in 11594ms.=C2=A0 CMS Old Gen: 3579831424 -> 3= 579817392; Par Eden Space: 503316480 -> 388702928; Par Survivor Space: 6= 2914552 -> 0
INFO =C2=A0[Service Thread] 2014-12-05 11:30:14,1= 53 GCInspector.java:142 - ConcurrentMarkSweep GC in 11463ms.=C2=A0 CMS Old = Gen: 3579817392 -> 3579838424; Par Eden Space: 503316480 -> 408992784= ; Par Survivor Space: 62896720 -> 0
INFO =C2=A0[Service Thread= ] 2014-12-05 11:30:25,009 GCInspector.java:142 - ConcurrentMarkSweep GC in = 9576ms.=C2=A0 CMS Old Gen: 3579838424 -> 3579816424; Par Eden Space: 503= 316480 -> 438633608; Par Survivor Space: 62914544 -> 0
INFO= =C2=A0[Service Thread] 2014-12-05 11:30:39,929 GCInspector.java:142 - Conc= urrentMarkSweep GC in 11556ms.=C2=A0 CMS Old Gen: 3579816424 -> 35797854= 96; Par Eden Space: 503316480 -> 441354856; Par Survivor Space: 62889528= -> 0
INFO =C2=A0[Service Thread] 2014-12-05 11:30:54,085 GCIn= spector.java:142 - ConcurrentMarkSweep GC in 12082ms.=C2=A0 CMS Old Gen: 35= 79786592 -> 3579814464; Par Eden Space: 503316480 -> 448782440; Par S= urvivor Space: 62914560 -> 0

In eac= h time Old Gen reduce only a little, Survivor Space will be clear but the h= eap is still full so there will be another full gc very soon then the node = will down. If I restart the node, it will be fine without gc trouble.=C2=A0=

Can anyone help me to find out where is the probl= em that full gc can't reduce CMS Old Gen? Is it because there are too m= any objects in heap can't be recycled? I think review the table scheme = designing and add new nodes into cluster is a good idea, but I still want t= o know if there is any other reason causing this trouble.

Thanks,
Philo Yang

--001a11c2c9a4840f93050a84b079--