From user-return-63159-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org  Mon Feb 11 08:06:09 2019
Return-Path: <user-return-63159-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id CFB09180648
	for <archive-asf-public@cust-asf.ponee.io>; Mon, 11 Feb 2019 09:06:08 +0100 (CET)
Received: (qmail 65053 invoked by uid 500); 11 Feb 2019 08:06:06 -0000
Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:user-help@cassandra.apache.org>
List-Unsubscribe: <mailto:user-unsubscribe@cassandra.apache.org>
List-Post: <mailto:user@cassandra.apache.org>
List-Id: <user.cassandra.apache.org>
Reply-To: user@cassandra.apache.org
Delivered-To: mailing list user@cassandra.apache.org
Received: (qmail 65042 invoked by uid 99); 11 Feb 2019 08:06:06 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Feb 2019 08:06:06 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 418041823D2
	for <user@cassandra.apache.org>; Mon, 11 Feb 2019 08:06:06 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 1.799
X-Spam-Level: *
X-Spam-Status: No, score=1.799 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001,
	RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001]
	autolearn=disabled
Authentication-Results: spamd3-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=backblaze.com
Received: from mx1-lw-eu.apache.org ([10.40.0.8])
	by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024)
	with ESMTP id E2UOocc38k6u for <user@cassandra.apache.org>;
	Mon, 11 Feb 2019 08:06:01 +0000 (UTC)
Received: from mail-qk1-f195.google.com (mail-qk1-f195.google.com [209.85.222.195])
	by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 4A793610E1
	for <user@cassandra.apache.org>; Mon, 11 Feb 2019 08:06:01 +0000 (UTC)
Received: by mail-qk1-f195.google.com with SMTP id w204so5908836qka.2
        for <user@cassandra.apache.org>; Mon, 11 Feb 2019 00:06:01 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=backblaze.com; s=google;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to;
        bh=8IWjDRi+UCQq+B3fWtbYLo98lyLDSpdnWq1nr1a4ToE=;
        b=SnPoMgcA/bklZdUC6MJkzagP20vCH/mncZVLQxxuNtvlY7jtHTC2d4DPcBk9eAKVDu
         ytN3W8Dea5gqVCEjrZWB9uqt7g30zVnoSmPBMO57lF3FXOhJl6MMGf3DcWZ7Jw3RnTxt
         L+mT6xO+7Ewmzm1X9iuex1w2s8qHSVQPNYT7IUqHl67zzBTlI2RICgjWgfuvYe8h1Bbw
         VMc3iwRbG2q298UexjW/+uxFTFsd+MzpTH++CpWoUXDvwdZXcOtGCrvmoyRu4SxjZBTi
         rWgx56+3tJM3KJeVdcJ9RFvIbRKtUNibnT90r/+18LAOb1qMbC023D/7HSqwX/kcciUi
         nI7A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to;
        bh=8IWjDRi+UCQq+B3fWtbYLo98lyLDSpdnWq1nr1a4ToE=;
        b=PeY9hSTF1ICLutkKRFwiUlhaCQw0suJgD4KAEJFFxDBRgKkj9KcjG3PhRxEoCJ+qlt
         QWv5G1B/W/roq7l8kZTMVwgmhVaiiR9o5/J0JwS1EiN4tB5mhZzvZ56fcKYCW+79WP6I
         817VlVsGmLmmcT72EBsL4rYViO8aL4s0Zi8EHdoiYg9lN5zzYm1ub3P5zTxMpF1Kyb4M
         i0gF6MAA5Cu1rmYai1JZ0JviCHfnRGINhiqju+z0q6KlsA3exQeNfHedZO+OD4lducw8
         BXOIah99DDuN1439zonUXENFm9RUyjoY475m6kR+peUJleG51Di21ulM59WE3LiqJjaD
         8TOw==
X-Gm-Message-State: AHQUAuaa2YrlI/3CcK4l8/iOAPCfzDmsQ/ldZScGX/Cy5bZ/z3bBvej1
	86Umto5lkvMZfE3SINHtxfTnvgyWUVmAioCZGSKIZBrn
X-Google-Smtp-Source: AHgI3IbCwIzpetYGrXVhGpDqoKHa5bcWkqoKHlKRixWWqdvaeDKCRD+uO3hOQZeUruG7lXJjPoWdrljjKxugeBaI5z4=
X-Received: by 2002:a37:c442:: with SMTP id h2mr23680689qkm.53.1549872359813;
 Mon, 11 Feb 2019 00:05:59 -0800 (PST)
MIME-Version: 1.0
References: <CCE915FF-9ACF-4A1C-9B3E-7FD36D082EB5@gmail.com>
In-Reply-To: <CCE915FF-9ACF-4A1C-9B3E-7FD36D082EB5@gmail.com>
From: Elliott Sims <elliott@backblaze.com>
Date: Mon, 11 Feb 2019 02:05:49 -0600
Message-ID: <CAARvq2MHgz0YAi_MWswq6iXV844XvOgWtOazM7L4VPTDrTo1HA@mail.gmail.com>
Subject: Re: High GC pauses leading to client seeing impact
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary="000000000000744900058199c751"

--000000000000744900058199c751
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

I would strongly suggest you consider an upgrade to 3.11.x.  I found it
decreased space needed by about 30% in addition to significantly lowering
GC.

As a first step, though, why not just revert to CMS for now if that was
working ok for you?  Then you can convert one host for diagnosis/tuning so
the cluster as a whole stays functional.

That's also a pretty old version of the JDK to be using G1.  I would
definitely upgrade that to 1.8u202 and see if the problem goes away.

On Sun, Feb 10, 2019, 10:22 PM Rajsekhar Mallick <raj.mallick14@gmail.com
wrote:

> Hello Team,
>
> I have a cluster of 17 nodes in production.(8 and 9 nodes in 2 DC).
> Cassandra version: 2.0.11
> Client connecting using thrift over port 9160
> Jdk version : 1.8.066
> GC used : G1GC (16GB heap)
> Other GC settings:
> Maxgcpausemillis=3D200
> Parallels gc threads=3D32
> Concurrent gc threads=3D 10
> Initiatingheapoccupancypercent=3D50
> Number of cpu cores for each system : 40
> Memory size: 185 GB
> Read/sec : 300 /sec on each node
> Writes/sec : 300/sec on each node
> Compaction strategy used : Size tiered compaction strategy
>
> Identified issues in the cluster:
> 1. Disk space usage across all nodes in the cluster is 80%. We are
> currently working on adding more storage on each node
> 2. There are 2 tables for which we keep on seeing large number of
> tombstones. One of table has read requests seeing 120 tombstones cells in
> last 5 mins as compared to 4 live cells. Tombstone warns and Error messag=
es
> of query getting aborted is also seen.
>
> Current issue sen:
> 1. We keep on seeing GC pauses of few minutes randomly across nodes in th=
e
> cluster. GC pauses of 120 seconds, even 770 seconds are also seen.
> 2. This leads to nodes getting stalled and client seeing direct impact
> 3. The GC pause we see, are not during any of G1GC phases. The GC log
> message prints =E2=80=9CTime to stop threads took 770 seconds=E2=80=9D. S=
o it is not the
> garbage collector doing any work but stopping the threads at a safe point
> is taking so much of time.
> 4. This issue has surfaced recently after we changed 8GB(CMS) to
> 16GB(G1GC) across all nodes in the cluster.
>
> Kindly do help on the above issue. I am not able to exactly understand if
> the GC is wrongly tuned, other if this is something else.
>
> Thanks,
> Rajsekhar Mallick
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>

--000000000000744900058199c751
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto"><div>I would strongly suggest you consider an upgrade to =
3.11.x.=C2=A0 I found it decreased space needed by about 30% in addition to=
 significantly lowering GC.</div><div dir=3D"auto"><br></div><div dir=3D"au=
to">As a first step, though, why not just revert to CMS for now if that was=
 working ok for you?=C2=A0 Then you can convert one host for diagnosis/tuni=
ng so the cluster as a whole stays functional.</div><div dir=3D"auto"><br><=
/div><div dir=3D"auto">That&#39;s also a pretty old version of the JDK to b=
e using G1.=C2=A0 I would definitely upgrade that to 1.8u202 and see if the=
 problem goes away.<br><br><div class=3D"gmail_quote" dir=3D"auto"><div dir=
=3D"ltr">On Sun, Feb 10, 2019, 10:22 PM Rajsekhar Mallick &lt;<a href=3D"ma=
ilto:raj.mallick14@gmail.com">raj.mallick14@gmail.com</a> wrote:<br></div><=
blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px=
 #ccc solid;padding-left:1ex">Hello Team,<br>
<br>
I have a cluster of 17 nodes in production.(8 and 9 nodes in 2 DC).<br>
Cassandra version: 2.0.11<br>
Client connecting using thrift over port 9160<br>
Jdk version : 1.8.066<br>
GC used : G1GC (16GB heap)<br>
Other GC settings:<br>
Maxgcpausemillis=3D200<br>
Parallels gc threads=3D32<br>
Concurrent gc threads=3D 10<br>
Initiatingheapoccupancypercent=3D50<br>
Number of cpu cores for each system : 40<br>
Memory size: 185 GB<br>
Read/sec : 300 /sec on each node<br>
Writes/sec : 300/sec on each node<br>
Compaction strategy used : Size tiered compaction strategy<br>
<br>
Identified issues in the cluster:<br>
1. Disk space usage across all nodes in the cluster is 80%. We are currentl=
y working on adding more storage on each node<br>
2. There are 2 tables for which we keep on seeing large number of tombstone=
s. One of table has read requests seeing 120 tombstones cells in last 5 min=
s as compared to 4 live cells. Tombstone warns and Error messages of query =
getting aborted is also seen.<br>
<br>
Current issue sen:<br>
1. We keep on seeing GC pauses of few minutes randomly across nodes in the =
cluster. GC pauses of 120 seconds, even 770 seconds are also seen.<br>
2. This leads to nodes getting stalled and client seeing direct impact<br>
3. The GC pause we see, are not during any of G1GC phases. The GC log messa=
ge prints =E2=80=9CTime to stop threads took 770 seconds=E2=80=9D. So it is=
 not the garbage collector doing any work but stopping the threads at a saf=
e point is taking so much of time.<br>
4. This issue has surfaced recently after we changed 8GB(CMS) to 16GB(G1GC)=
 across all nodes in the cluster.<br>
<br>
Kindly do help on the above issue. I am not able to exactly understand if t=
he GC is wrongly tuned, other if this is something else.<br>
<br>
Thanks,<br>
Rajsekhar Mallick<br>
<br>
<br>
<br>
---------------------------------------------------------------------<br>
To unsubscribe, e-mail: <a href=3D"mailto:user-unsubscribe@cassandra.apache=
.org" target=3D"_blank" rel=3D"noreferrer">user-unsubscribe@cassandra.apach=
e.org</a><br>
For additional commands, e-mail: <a href=3D"mailto:user-help@cassandra.apac=
he.org" target=3D"_blank" rel=3D"noreferrer">user-help@cassandra.apache.org=
</a><br>
<br>
</blockquote></div></div></div>

--000000000000744900058199c751--