incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Trinneer <matth...@coveritlive.com>
Subject Re: 1.0.6 - High CPU troubleshooting
Date Mon, 06 Feb 2012 13:57:53 GMT
Aaron,

Have reduced cache sizes and been monitoring for the past week.  It appears as if this was
the culprit - since the changes have not seen a resurfacing.  

For those keeping score at home.

* Had sudden persistent spikes in CPU from the Cassandra java process
* Occurred every 24-48 hours and required a restart to resolve
* Reducing row cache sizes on some of the more active column families (which have wide rows)
appears to eliminate the issue.


On 2012-01-25, at 7:49 PM, aaron morton wrote:

> You are running into GC issues. 
> 
>>> WARN [ScheduledTasks:1] 2012-01-22 12:53:42,804 GCInspector.java (line 146) Heap
is 0.7767292149986439 full.  You may need to reduce memtable and/or cache sizes.  Cassandra
will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at
threshold in cassandra.yaml if you don't want Cassandra to do this automatically
> 
> Can you reduce the size of the caches ? 
> 
> As you are under low load, does it correlate with compaction or repair processes ? Check
node tool compactioninfo
> 
> Do you have wide rows ? Checlk the max row size with nodetool cfstats. 
> 
> Also, if you have made any changes to the default memory and gc settings try reverting
them. 
> 
> 
> Hope that helps. 
> 
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 26/01/2012, at 5:24 AM, Vitalii Tymchyshyn wrote:
> 
>> According to the log, I don't see much time spent for GC. You can still check it
with jstat or uncomment GC logging in cassandra-env.sh. Are you sure you've identified the
thread correctly?
>> It's still possible that you have memory spike where GCInspector simply has no chance
to run between Full GC rounds. Checking with jstat or adding GC logging may help to diagnose.
>> 
>> 25.01.12 17:24, Matthew Trinneer написав(ла):
>>> Here is a snippet of what I'm getting out of system.log for GC.  Anything is
there provide a clue?
>>> 
>>>  WARN [ScheduledTasks:1] 2012-01-22 12:53:42,804 GCInspector.java (line 146)
Heap is 0.7767292149986439 full.  You may need to reduce memtable and/or cache sizes.  Cassandra
will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at
threshold in cassandra.yaml if you don't want Cassandra to do this automatically
>>>  INFO [ScheduledTasks:1] 2012-01-22 12:54:57,685 GCInspector.java (line 123)
GC for ConcurrentMarkSweep: 240 ms for 1 collections, 111478936 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-22 15:12:21,710 GCInspector.java (line 123)
GC for ConcurrentMarkSweep: 1141 ms for 1 collections, 167667688 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-23 14:20:32,862 GCInspector.java (line 123)
GC for ParNew: 205 ms for 1 collections, 2894546328 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-23 20:25:06,541 GCInspector.java (line 123)
GC for ParNew: 240 ms for 1 collections, 4602331064 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 13:24:57,473 GCInspector.java (line 123)
GC for ConcurrentMarkSweep: 27869 ms for 1 collections, 6376733632 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 13:25:24,879 GCInspector.java (line 123)
GC for ConcurrentMarkSweep: 26306 ms for 1 collections, 6392079368 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 13:27:12,991 GCInspector.java (line 123)
GC for ConcurrentMarkSweep: 238 ms for 1 collections, 131710776 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 13:55:48,326 GCInspector.java (line 123)
GC for ConcurrentMarkSweep: 609 ms for 1 collections, 50380160 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 14:34:41,392 GCInspector.java (line 123)
GC for ParNew: 325 ms for 1 collections, 1340375240 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 20:55:19,636 GCInspector.java (line 123)
GC for ParNew: 233 ms for 1 collections, 6387236992 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 14:43:28,921 GCInspector.java (line 123)
GC for ParNew: 337 ms for 1 collections, 7031219304 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 14:43:51,043 GCInspector.java (line 123)
GC for ParNew: 211 ms for 1 collections, 7025723712 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 14:50:00,012 GCInspector.java (line 123)
GC for ConcurrentMarkSweep: 51534 ms for 2 collections, 6844998736 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 14:51:22,249 GCInspector.java (line 123)
GC for ConcurrentMarkSweep: 250 ms for 1 collections, 154848440 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 14:57:46,519 GCInspector.java (line 123)
GC for ParNew: 244 ms for 1 collections, 190838344 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 15:00:21,693 GCInspector.java (line 123)
GC for ConcurrentMarkSweep: 389 ms for 1 collections, 28748448 used; max is 8547991552
>>> 
>>> 
>>> 
>>> On 2012-01-25, at 10:09 AM, Vitalii Tymchyshyn wrote:
>>> 
>>>> Hello.
>>>> 
>>>> What's in the logs? It should output something like "Hey, you've got most
of your memory used. I am going to flush some of memtables". Sorry, I don't remember exact
spelling, but it's gong from GC, so it should be greppable by "GC".
>>>> 
>>>> 25.01.12 16:26, Matthew Trinneer написав(ла):
>>>>> Hello Community,
>>>>> 
>>>>> Am troubleshooting an issue with sudden and sustained high CPU on nodes
in a 3 node cluster.  This takes place when there is minimal load on the servers, and continues
indefinitely until I stop and restart a node.  All nodes (3) seem to be effected by the same
issue, however it doesn't occur simultaneous on them (although all potentially could be effected
at the same time).
>>>>> 
>>>>> Here's what I've been doing to troubleshoot.
>>>>> 
>>>>> * vmstat - have run on all servers and there is *no* swap going on
>>>>> 
>>>>> * iostat - have run on all servers and there is no indication that the
disk is in anyway a bottleneck
>>>>> 
>>>>> * nodetool - nothing is backed up.  Not using all available heap (about
2/3rds).
>>>>> 
>>>>> * Have tracked it down to a specific thread (top -H).  When I find that
thread's hex equivalent in a jstack dump I see the following
>>>>> 
>>>>> 	"Concurrent Mark-Sweep GC Thread" prio=10 tid=0x0000000001d48800 nid=0x4534
runnable
>>>>> 
>>>>> 
>>>>> Which makes me think that perhaps it's something to do with GC configuration.
  Unfortunately I'm not able to determine why this might be happening.  Any suggestions on
how to continue troubleshooting?
>>>>> 
>>>>> btw - here's my jvm (just in case)
>>>>> 
>>>>> java version "1.6.0_22"
>>>>> Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
>>>>> Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)
>>>>> 
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>> Matt
>>>>> 
>> 
> 


Mime
View raw message