incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: constant CMS GC using CPU time
Date Tue, 23 Oct 2012 01:05:19 GMT
> The GC was on-going even when the nodes were not compacting or running a heavy application
load -- even when the main app was paused constant the GC continued.
If you restart a node is the onset of GC activity correlated to some event?
 
> As a test we dropped the largest CF and the memory usage immediately dropped to acceptable
levels and the constant GC stopped.  So it's definitely related to data load.  memtable size
is 1 GB, row cache is disabled and key cache is small (default).
How many keys did the CF have per node? 
I dismissed the memory used to  hold bloom filters and index sampling. That memory is not
considered part of the memtable size, and will end up in the tenured heap. It is generally
only a problem with very large key counts per node. 

>  They were 2+ GB (as reported by nodetool cfstats anyway).  It looks like the default
bloom_filter_fp_chance defaults to 0.0 
The default should be 0.000744.

If the chance is zero or null this code should run when a new SSTable is written 
  // paranoia -- we've had bugs in the thrift <-> avro <-> CfDef dance before,
let's not let that break things
                logger.error("Bloom filter FP chance of zero isn't supposed to happen");

Were the CF's migrated from an old version ?

> Is there any way to predict how much memory the bloom filters will consume if the size
of the row keys, number or rows is known, and fp chance is known?

See o.a.c.utils.BloomFilter.getFilter() in the code 
This http://hur.st/bloomfilter appears to give similar results. 

Cheers
 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/10/2012, at 4:38 AM, Bryan Talbot <btalbot@aeriagames.com> wrote:

> The memory usage was correlated with the size of the data set.  The nodes were a bit
unbalanced which is normal due to variations in compactions.  The nodes with the most data
used the most memory.  All nodes are affected eventually not just one.  The GC was on-going
even when the nodes were not compacting or running a heavy application load -- even when the
main app was paused constant the GC continued.
> 
> As a test we dropped the largest CF and the memory usage immediately dropped to acceptable
levels and the constant GC stopped.  So it's definitely related to data load.  memtable size
is 1 GB, row cache is disabled and key cache is small (default).
> 
> I believe one culprit turned out to be the bloom filters.  They were 2+ GB (as reported
by nodetool cfstats anyway).  It looks like the default bloom_filter_fp_chance defaults to
0.0 even though guides recommend 0.10 as the minimum value.  Raising that to 0.20 for some
write-mostly CF reduced memory used by 1GB or so.
> 
> Is there any way to predict how much memory the bloom filters will consume if the size
of the row keys, number or rows is known, and fp chance is known?
> 
> -Bryan
> 
> 
> 
> On Mon, Oct 22, 2012 at 12:25 AM, aaron morton <aaron@thelastpickle.com> wrote:
> If you are using the default settings I would try to correlate the GC activity with some
application activity before tweaking.
> 
> If this is happening on one machine out of 4 ensure that client load is distributed evenly.

> 
> See if the raise in GC activity us related to Compaction, repair or an increase in throughput.
OpsCentre or some other monitoring can help with the last one. Your mention of TTL makes me
think compaction may be doing a bit of work churning through rows. 
>   
> Some things I've done in the past before looking at heap settings:
> * reduce compaction_throughput to reduce the memory churn
> * reduce in_memory_compaction_limit 
> * if needed reduce concurrent_compactors
> 
>> Currently it seems like the memory used scales with the amount of bytes stored and
not with how busy the server actually is.  That's not such a good thing.
> The memtable_total_space_in_mb in yaml tells C* how much memory to devote to the memtables.
That with the global row cache setting says how much memory will be used with regard to "storing"
data and it will not increase inline with the static data load.
> 
> Now days GC issues are typically due to more dynamic forces, like compaction, repair
and throughput. 
>  
> Hope that helps. 
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 20/10/2012, at 6:59 AM, Bryan Talbot <btalbot@aeriagames.com> wrote:
> 
>> ok, let me try asking the question a different way ...
>> 
>> How does cassandra use memory and how can I plan how much is needed?  I have a 1
GB memtable and 5 GB total heap and that's still not enough even though the number of concurrent
connections and garbage generation rate is fairly low.
>> 
>> If I were using mysql or oracle, I could compute how much memory could be used by
N concurrent connections, how much is allocated for caching, temp spaces, etc.  How can I
do this for cassandra?  Currently it seems like the memory used scales with the amount of
bytes stored and not with how busy the server actually is.  That's not such a good thing.
>> 
>> -Bryan
>> 
>> 
>> 
>> On Thu, Oct 18, 2012 at 11:06 AM, Bryan Talbot <btalbot@aeriagames.com> wrote:
>> In a 4 node cluster running Cassandra 1.1.5 with sun jvm 1.6.0_29-b11 (64-bit), the
nodes are often getting "stuck" in state where CMS collections of the old space are constantly
running.  
>> 
>> The JVM configuration is using the standard settings in cassandra-env -- relevant
settings are included below.  The max heap is currently set to 5 GB with 800MB for new size.
 I don't believe that the cluster is overly busy and seems to be performing well enough other
than this issue.  When nodes get into this state they never seem to leave it (by freeing up
old space memory) without restarting cassandra.  They typically enter this state while running
"nodetool repair -pr" but once they start doing this, restarting them only "fixes" it for
a couple of hours.
>> 
>> Compactions are completing and are generally not queued up.  All CF are using STCS.
 The busiest CF consumes about 100GB of space on disk, is write heavy, and all columns have
a TTL of 3 days.  Overall, there are 41 CF including those used for system keyspace and secondary
indexes.  The number of SSTables per node currently varies from 185-212.
>> 
>> Other than frequent log warnings about "GCInspector  - Heap is 0.xxx full..." and
"StorageService  - Flushing CFS(...) to relieve memory pressure" there are no other log entries
to indicate there is a problem.
>> 
>> Does the memory needed vary depending on the amount of data stored?  If so, how can
I predict how much jvm space is needed?  I don't want to make the heap too large as that's
bad too.  Maybe there's a memory leak related to compaction that doesn't allow meta-data to
be purged?
>> 
>> 
>> -Bryan
>> 
>> 
>> 12 GB of RAM in host with ~6 GB used by java and ~6 GB for OS and buffer cache.
>> $> free -m
>>              total       used       free     shared    buffers     cached
>> Mem:         12001      11870        131          0          4       5778
>> -/+ buffers/cache:       6087       5914
>> Swap:            0          0          0
>> 
>> 
>> jvm settings in cassandra-env
>> MAX_HEAP_SIZE="5G"
>> HEAP_NEWSIZE="800M"
>> 
>> # GC tuning options
>> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC" 
>> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC" 
>> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled" 
>> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8" 
>> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
>> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
>> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>> JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops"
>> 
>> 
>> jstat shows about 12 full collections per minute with old heap usage constantly over
75% so CMS is always over the CMSInitiatingOccupancyFraction threshold.
>> 
>> $> jstat -gcutil -t 22917 5000 4
>> Timestamp         S0     S1     E      O      P     YGC     YGCT    FGC    FGCT 
   GCT   
>>        132063.0  34.70   0.00  26.03  82.29  59.88  21580  506.887 17523 3078.941
3585.829
>>        132068.0  34.70   0.00  50.02  81.23  59.88  21580  506.887 17524 3079.220
3586.107
>>        132073.1   0.00  24.92  46.87  81.41  59.88  21581  506.932 17525 3079.583
3586.515
>>        132078.1   0.00  24.92  64.71  81.40  59.88  21581  506.932 17527 3079.853
3586.785
>> 
>> 
>> Other hosts not currently experiencing the high CPU load have a heap less than .75
full.
>> 
>> $> jstat -gcutil -t 6063 5000 4
>> Timestamp         S0     S1     E      O      P     YGC     YGCT    FGC    FGCT 
   GCT
>>        520731.6   0.00  12.70  36.37  71.33  59.26  46453 1688.809 14785 2130.779
3819.588
>>        520736.5   0.00  12.70  53.25  71.33  59.26  46453 1688.809 14785 2130.779
3819.588
>>        520741.5   0.00  12.70  68.92  71.33  59.26  46453 1688.809 14785 2130.779
3819.588
>>        520746.5   0.00  12.70  83.11  71.33  59.26  46453 1688.809 14785 2130.779
3819.588
>> 
>> 
>> 
>> 
>> 
> 
> 
> 
> 
> -- 
> Bryan Talbot
> Architect / Platform team lead, Aeria Games and Entertainment
> Silicon Valley | Berlin | Tokyo | Sao Paulo
> 
> 


Mime
View raw message