cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Questions around the heap
Date Mon, 12 Nov 2012 20:58:04 GMT
For background, this thread discusses the working for cassandra http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html

tl;dr you can work it out or guess based on the tenured usage after CMS. 

> How can we know how the heap is being used, monitor it ?
My favourite is to turn on the gc logging in cassandra-env.sh 
I can also recommend the GC coverage in this book http://amzn.com/0137142528

You can also use JConsole or anything else that reads the JVM metrics via JMX.

> Why have I that much memory used in the heap of my new servers ?
IMHO the m1.xlarge is the best EC2 node (apart from ssd) to use. 

>  I configured a 4G heap with a 200M "new size".

That is a *very* low new heap size. I would expect it to result it frequent premature promotion
into the tenured heap. Which will make it look like you are using more memory.


> That is the heap that was supposed to be used.
> 
> Memtable  : 1.4G (1/3 of the heap)
> Key cache : 0.1G (min(5% of Heap (in MB), 100MB))
> System     : 1G     (more or less, from datastax doc)
> 
> So we are around 2.5G max in theory out of 3G usable (threshold 0.75 of the heap before
flushing memtable because of pressure)
The memtable usage is the maxium value, if all the memtables are full and the flush queue
is full. It's not the working size used for memtables. The code tries to avoid ever hitting
the maximum. 
Not sure if the 1G for "system" is still current or what it's actually referring to.


I suggest:
* returning the configuration to the defaults.
* if you have a high number of rows looking at the working set calculations linked above.
* monitoring the servers to look for triggers for the GC activity, such as compaction or repair
* looking at your code base for read queries that read a lot of data. May be write but it's
often read.
* if you are using default compaction strategy, looking at the data model rows that have a
high number of deletes and or overwrites over a longtime. These can have a high tombstone
count. 

GC activity is relative to the workload. Try to find things that cause a lot of columns to
be read from disk.

I've found the following JVM tweeks sometimes helpful:

MAX_HEAP_SIZE="4G"
HEAP_NEWSIZE="1200M"
SurvivorRatio=4
MaxTenuringThreshold=4

Hope that helps.

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 12/11/2012, at 10:26 PM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:

> It's been Does anybody has an answer to any of these questions ?
> 
> Alain
> 
> 
> 2012/11/7 Hiller, Dean <Dean.Hiller@nrel.gov>
> +1, I am interested in this answer as well.
> 
> From: Alain RODRIGUEZ <arodrime@gmail.com<mailto:arodrime@gmail.com>>
> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Date: Wednesday, November 7, 2012 9:45 AM
> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Subject: Re: Questions around the heap
> 
> s application that heavily scans a particular column family, you would want to inhibit
or disable the Bloom filter on the column family by setting it high"
> 


Mime
View raw message