cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: constant CMS GC using CPU time
Date Fri, 26 Oct 2012 08:49:08 GMT
> How does compaction_throughput relate to memory usage?  
It reduces the rate of memory allocation. 
e.g. Say normally ParNew can keep up with the rate of memory usage without stopping for too
long: so the rate of promotion is low'ish and every thing is allocated to Eden. If the allocation
rate gets higher ParNew may be more frequent and objects may be promoted to tenured that don't
really need to be there.  

>  I assumed that was more for IO tuning.  I noticed that lowering concurrent_compactors
to 4 (from default of 8) lowered the memory used during compactions.
Similar thing to above. This may reduce the number of rows held in memory at any instant for
compaction. 

Only rows less than in_memory_compaction_limit are loaded into memory during compaction. So
reducing that may reduce the memory usage.

>  Since then I've reduced the TTL to 1 hour and set gc_grace_seconds to 0 so the number
of rows and data dropped to a level it can handle.
Cool. Sorry if took so long to get there. 


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 26/10/2012, at 8:08 AM, Bryan Talbot <btalbot@aeriagames.com> wrote:

> On Thu, Oct 25, 2012 at 4:15 AM, aaron morton <aaron@thelastpickle.com> wrote:
>> This sounds very much like "my heap is so consumed by (mostly) bloom
>> filters that I am in steady state GC thrash."
>> 
>> Yes, I think that was at least part of the issue.
> 
> The rough numbers I've used to estimate working set are:
> 
> * bloom filter size for 400M rows at 0.00074 fp without java fudge (they are just a big
array) 714 MB
> * memtable size 1024 MB 
> * index sampling:
> 	*  24 bytes + key (16 bytes for UUID) = 32 bytes 
> 	* 400M / 128 default sampling = 3,125,000
> 	*  3,125,000 * 32 = 95 MB
> 	* java fudge X5 or X10 = 475MB to 950MB
> * ignoring row cache and key cache
>  
> So the high side number is 2213 to 2,688. High because the fudge is a delicious sticky
guess and the memtable space would rarely be full. 
> 
> On a 5120 MB heap, with 800MB new you have roughly  4300 MB tenured  (some goes to perm)
and 75% of that is 3,225 MB. Not terrible but it depends on the working set and how quickly
stuff get's tenured which depends on the workload. 
> 
> These values seem reasonable and in line with what I was seeing.  There are other CF
and apps sharing this cluster but this one was the largest.  
> 
> 
>   
> 
> You can confirm these guesses somewhat manually by enabling all the GC logging in cassandra-env.sh.
Restart the node and let it operate normally, probably best to keep repair off.
> 
> 
> 
> I was using jstat to monitor gc activity and some snippets from that are in my original
email in this thread.  The key behavior was that full gc was running pretty often and never
able to reclaim much (if any) space.
> 
> 
>  
> 
> There are a few things you could try:
> 
> * increase the JVM heap by say 1Gb and see how it goes
> * increase bloom filter false positive,  try 0.1 first (see http://www.datastax.com/docs/1.1/configuration/storage_configuration#bloom-filter-fp-chance)

> * increase index_interval sampling in yaml.  
> * decreasing compaction_throughput and in_memory_compaction_limit can lesson the additional
memory pressure compaction adds. 
> * disable caches or ensure off heap caches are used.
> 
> I've done several of these already in addition to changing the app to reduce the number
of rows retained.  How does compaction_throughput relate to memory usage?  I assumed that
was more for IO tuning.  I noticed that lowering concurrent_compactors to 4 (from default
of 8) lowered the memory used during compactions.  in_memory_compaction_limit_in_mb seems
to only be used for wide rows and this CF didn't have any wider than in_memory_compaction_limit_in_mb.
 My multithreaded_compaction is still false.
> 
>  
> 
> Watching the gc logs and the cassandra log is a great way to get a feel for what works
in your situation. Also take note of any scheduled processing your app does which may impact
things, and look for poorly performing queries. 
> 
> Finally this book is a good reference on Java GC http://amzn.com/0137142528 
> 
> For my understanding what was the average row size for the 400 million keys ? 
> 
> 
> 
> The compacted row mean size for the CF is 8815 (as reported by cfstats) but that comes
out to be much larger than the real load per node I was seeing.  Each node had about 200GB
of data for the CF with 4 nodes in the cluster and RF=3.  At the time, the TTL for all columns
was 3 days and gc_grace_seconds was 5 days.  Since then I've reduced the TTL to 1 hour and
set gc_grace_seconds to 0 so the number of rows and data dropped to a level it can handle.
> 
> 
> -Bryan
> 


Mime
View raw message