Aaron,

Have reduced cache sizes and been monitoring for the past week.  It appears as if this was the culprit - since the changes have not seen a resurfacing.  

For those keeping score at home.

* Had sudden persistent spikes in CPU from the Cassandra java process
* Occurred every 24-48 hours and required a restart to resolve
* Reducing row cache sizes on some of the more active column families (which have wide rows) appears to eliminate the issue.


On 2012-01-25, at 7:49 PM, aaron morton wrote:

You are running into GC issues. 

WARN [ScheduledTasks:1] 2012-01-22 12:53:42,804 GCInspector.java (line 146) Heap is 0.7767292149986439 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically

Can you reduce the size of the caches ? 

As you are under low load, does it correlate with compaction or repair processes ? Check node tool compactioninfo

Do you have wide rows ? Checlk the max row size with nodetool cfstats. 

Also, if you have made any changes to the default memory and gc settings try reverting them. 


Hope that helps. 


-----------------
Aaron Morton
Freelance Developer
@aaronmorton

On 26/01/2012, at 5:24 AM, Vitalii Tymchyshyn wrote:

According to the log, I don't see much time spent for GC. You can still check it with jstat or uncomment GC logging in cassandra-env.sh. Are you sure you've identified the thread correctly?
It's still possible that you have memory spike where GCInspector simply has no chance to run between Full GC rounds. Checking with jstat or adding GC logging may help to diagnose.

25.01.12 17:24, Matthew Trinneer написав(ла):
Here is a snippet of what I'm getting out of system.log for GC.  Anything is there provide a clue?

 WARN [ScheduledTasks:1] 2012-01-22 12:53:42,804 GCInspector.java (line 146) Heap is 0.7767292149986439 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
 INFO [ScheduledTasks:1] 2012-01-22 12:54:57,685 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 240 ms for 1 collections, 111478936 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-22 15:12:21,710 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 1141 ms for 1 collections, 167667688 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-23 14:20:32,862 GCInspector.java (line 123) GC for ParNew: 205 ms for 1 collections, 2894546328 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-23 20:25:06,541 GCInspector.java (line 123) GC for ParNew: 240 ms for 1 collections, 4602331064 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-24 13:24:57,473 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 27869 ms for 1 collections, 6376733632 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-24 13:25:24,879 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 26306 ms for 1 collections, 6392079368 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-24 13:27:12,991 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 238 ms for 1 collections, 131710776 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-24 13:55:48,326 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 609 ms for 1 collections, 50380160 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-24 14:34:41,392 GCInspector.java (line 123) GC for ParNew: 325 ms for 1 collections, 1340375240 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-24 20:55:19,636 GCInspector.java (line 123) GC for ParNew: 233 ms for 1 collections, 6387236992 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-25 14:43:28,921 GCInspector.java (line 123) GC for ParNew: 337 ms for 1 collections, 7031219304 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-25 14:43:51,043 GCInspector.java (line 123) GC for ParNew: 211 ms for 1 collections, 7025723712 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-25 14:50:00,012 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 51534 ms for 2 collections, 6844998736 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-25 14:51:22,249 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 250 ms for 1 collections, 154848440 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-25 14:57:46,519 GCInspector.java (line 123) GC for ParNew: 244 ms for 1 collections, 190838344 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-25 15:00:21,693 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 389 ms for 1 collections, 28748448 used; max is 8547991552



On 2012-01-25, at 10:09 AM, Vitalii Tymchyshyn wrote:

Hello.

What's in the logs? It should output something like "Hey, you've got most of your memory used. I am going to flush some of memtables". Sorry, I don't remember exact spelling, but it's gong from GC, so it should be greppable by "GC".

25.01.12 16:26, Matthew Trinneer написав(ла):
Hello Community,

Am troubleshooting an issue with sudden and sustained high CPU on nodes in a 3 node cluster.  This takes place when there is minimal load on the servers, and continues indefinitely until I stop and restart a node.  All nodes (3) seem to be effected by the same issue, however it doesn't occur simultaneous on them (although all potentially could be effected at the same time).

Here's what I've been doing to troubleshoot.

* vmstat - have run on all servers and there is *no* swap going on

* iostat - have run on all servers and there is no indication that the disk is in anyway a bottleneck

* nodetool - nothing is backed up.  Not using all available heap (about 2/3rds).

* Have tracked it down to a specific thread (top -H).  When I find that thread's hex equivalent in a jstack dump I see the following

"Concurrent Mark-Sweep GC Thread" prio=10 tid=0x0000000001d48800 nid=0x4534 runnable


Which makes me think that perhaps it's something to do with GC configuration.   Unfortunately I'm not able to determine why this might be happening.  Any suggestions on how to continue troubleshooting?

btw - here's my jvm (just in case)

java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)


Thanks!

Matt