On Thu, Oct 25, 2012 at 4:15 AM, aaron morton <firstname.lastname@example.org> wrote:
This sounds very much like "my heap is so consumed by (mostly) bloom
filters that I am in steady state GC thrash."Yes, I think that was at least part of the issue.The rough numbers I've used to estimate working set are:* bloom filter size for 400M rows at 0.00074 fp without java fudge (they are just a big array) 714 MB* memtable size 1024 MB* index sampling:* 24 bytes + key (16 bytes for UUID) = 32 bytes* 400M / 128 default sampling = 3,125,000* 3,125,000 * 32 = 95 MB* java fudge X5 or X10 = 475MB to 950MB* ignoring row cache and key cacheSo the high side number is 2213 to 2,688. High because the fudge is a delicious sticky guess and the memtable space would rarely be full.On a 5120 MB heap, with 800MB new you have roughly 4300 MB tenured (some goes to perm) and 75% of that is 3,225 MB. Not terrible but it depends on the working set and how quickly stuff get's tenured which depends on the workload.
You can confirm these guesses somewhat manually by enabling all the GC logging in cassandra-env.sh. Restart the node and let it operate normally, probably best to keep repair off.
There are a few things you could try:* increase the JVM heap by say 1Gb and see how it goes* increase bloom filter false positive, try 0.1 first (see http://www.datastax.com/docs/1.1/configuration/storage_configuration#bloom-filter-fp-chance)* increase index_interval sampling in yaml.* decreasing compaction_throughput and in_memory_compaction_limit can lesson the additional memory pressure compaction adds.* disable caches or ensure off heap caches are used.
Watching the gc logs and the cassandra log is a great way to get a feel for what works in your situation. Also take note of any scheduled processing your app does which may impact things, and look for poorly performing queries.Finally this book is a good reference on Java GC http://amzn.com/0137142528For my understanding what was the average row size for the 400 million keys ?