incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: cassandra GC cpu usage
Date Wed, 17 Jul 2013 09:49:45 GMT
Dive into the logs and look for messages from the GCInspector. These log ParNew and CMS activity
that takes over 200 ms. To get further insight consider enabling the full GC logging (see
cassandra-env.sh) on one of the problem nodes. 

Looking at your graphs you are getting about 2 ParNew collections a second that are running
around 130ms, so the server is pausing for about 260ms per second to do ParNew. Which is not
great. 

CMS activity can also suck up CPU specially if  it's not able to drain the tenured heap.

ParNew activity is more of a measure of the throughput on the node. Can you correlate the
problems with application load? Does it happen at regular intervals ? Can you correlate it
with repaur or compaction processes ?

Hope that helps 

-----------------
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/07/2013, at 12:14 AM, Mohit Anchlia <mohitanchlia@gmail.com> wrote:

> What's your replication factor? Can you check tp stats and net stats to see if you are
getting more mutations on these nodes ?
> 
> Sent from my iPhone
> 
> On Jul 16, 2013, at 3:18 PM, Jure Koren <jure.koren@zemanta.com> wrote:
> 
>> Hi C* user list,
>> 
>> I have a curious recurring problem with Cassandra 1.2 and what seems like a GC issue.
>> 
>> The cluster looks somewhat well balanced, all nodes are running HotSpot JVM 1.6.0_31-b04
and cassandra 1.2.3.
>> 
>> Address Rack Status State Load Owns
>> 10.2.3.6 RAC6 Up Normal 15.13 GB 12.71%
>> 10.2.3.5 RAC5 Up Normal 16.87 GB 13.57%
>> 10.2.3.8 RAC8 Up Normal 13.27 GB 13.71%
>> 10.2.3.1 RAC1 Up Normal 16.46 GB 14.08%
>> 10.2.3.7 RAC7 Up Normal 11.59 GB 14.34%
>> 10.2.3.2 RAC2 Up Normal 23.15 GB 15.12%
>> 10.2.3.4 RAC4 Up Normal 16.52 GB 16.47%
>> 
>> Every now and then (roughly once a month, currently), two nodes (always the same
two) need to be restarted after they start eating all available CPU cycles and read and write
latencies increase dramatically. Restart fixes this every time.
>> 
>> The only metric that significantly deviates from the average for all nodes shows
GC doing something: http://bou.si/rest/parnew.png
>> 
>> Is there a way to debug this? After searching online it appears as nobody has really
solved this problem and I have no idea what could cause such behaviour in just two particular
cluster nodes.
>> 
>> I'm now thinking of decomissioning the problematic nodes and bootstrapping them anew,
but can't decide if this could possibly help.
>> 
>> Thanks in advance for any insight anyone might offer,
>> 
>> --
>> Jure Koren, DevOps
>> http://www.zemanta.com/


Mime
View raw message