cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elliott Sims <elli...@backblaze.com>
Subject Re: High GC pauses leading to client seeing impact
Date Mon, 11 Feb 2019 08:05:49 GMT
I would strongly suggest you consider an upgrade to 3.11.x.  I found it
decreased space needed by about 30% in addition to significantly lowering
GC.

As a first step, though, why not just revert to CMS for now if that was
working ok for you?  Then you can convert one host for diagnosis/tuning so
the cluster as a whole stays functional.

That's also a pretty old version of the JDK to be using G1.  I would
definitely upgrade that to 1.8u202 and see if the problem goes away.

On Sun, Feb 10, 2019, 10:22 PM Rajsekhar Mallick <raj.mallick14@gmail.com
wrote:

> Hello Team,
>
> I have a cluster of 17 nodes in production.(8 and 9 nodes in 2 DC).
> Cassandra version: 2.0.11
> Client connecting using thrift over port 9160
> Jdk version : 1.8.066
> GC used : G1GC (16GB heap)
> Other GC settings:
> Maxgcpausemillis=200
> Parallels gc threads=32
> Concurrent gc threads= 10
> Initiatingheapoccupancypercent=50
> Number of cpu cores for each system : 40
> Memory size: 185 GB
> Read/sec : 300 /sec on each node
> Writes/sec : 300/sec on each node
> Compaction strategy used : Size tiered compaction strategy
>
> Identified issues in the cluster:
> 1. Disk space usage across all nodes in the cluster is 80%. We are
> currently working on adding more storage on each node
> 2. There are 2 tables for which we keep on seeing large number of
> tombstones. One of table has read requests seeing 120 tombstones cells in
> last 5 mins as compared to 4 live cells. Tombstone warns and Error messages
> of query getting aborted is also seen.
>
> Current issue sen:
> 1. We keep on seeing GC pauses of few minutes randomly across nodes in the
> cluster. GC pauses of 120 seconds, even 770 seconds are also seen.
> 2. This leads to nodes getting stalled and client seeing direct impact
> 3. The GC pause we see, are not during any of G1GC phases. The GC log
> message prints “Time to stop threads took 770 seconds”. So it is not the
> garbage collector doing any work but stopping the threads at a safe point
> is taking so much of time.
> 4. This issue has surfaced recently after we changed 8GB(CMS) to
> 16GB(G1GC) across all nodes in the cluster.
>
> Kindly do help on the above issue. I am not able to exactly understand if
> the GC is wrongly tuned, other if this is something else.
>
> Thanks,
> Rajsekhar Mallick
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>

Mime
View raw message