You're welcome. I'll answer to your new questions but keep in mind that I am not a cassandra commiter nor even a cassandra specialist.

"you mean that key cache is not in heap? I am using cassandra 1.0.8 and I was under the expression it was, see http://www.datastax.com/docs/1.0/operations/tuning, Tuning Java Heap Size."

http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management

If I understood this correctly, It seems that  only the row cache is off-heap. So it's not an issue for us as far as we don't use row cache.

"I thought that key-cache-size + 1GB + memtable space should not exceed heap size. Am I wrong?" 

I don't know if this is a good formula. Datastax gives it so it shouldn't be that bad :). However I would say that "key-cache-size + 1GB + memtable space"  should not exceed 0.75 * Max Heap (where 0.75 is flush_largest_memtables_at). I keep default key-cache (which is 5% of max heap if I remember well on 1.1.x) and default memtable space (1/3 of max heap). I have enlarged my heap from 2 to 4 GB because I had some memory pressure (sometimes the Heap Used was greater than 0.75 * Max Heap)

"WARN [ScheduledTasks:1] 2012-08-20 12:31:46,506 GCInspector.java (line 145) Heap is 0.7704251937535934 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically"

This message is the memory pressure I was talking about just above.

"How do I know if my off-heap memory is not used?"

Well, if you got no row cache and your server is only used as a Cassandra node, I'm quite sure you can tune your heap to get 4GB. I guess a htop or any memory monitoring system is able to tell you how much your memory is used.

I hope I didn't tell you too much bullshits :p.

Alain

2012/8/21 Tamar Fraenkel <tamar@tok-media.com>
Thanks for you prompt response. Please see follow up questions below
Thanks!!!



Tamar Fraenkel 
Senior Software Engineer, TOK Media 






On Tue, Aug 21, 2012 at 12:57 PM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:
I have the same configuration and I recently change  my cassandra-sh.yaml to :

MAX_HEAP_SIZE="4G"
HEAP_NEWSIZE="200M"

I guess it depends on how much you use the cache (which is now in the off-heap memory).

you mean that key cache is not in heap? I am using cassandra 1.0.8 and I was under the expression it was, see http://www.datastax.com/docs/1.0/operations/tuning, Tuning Java Heap Size.
I thought that key-cache-size + 1GB + memtable space should not exceed heap size. Am I wrong?


I don't use row cache and use the default key cache size.
Me too, I have Key Cache capacity of 200000 for all my CFs. Currently if my calculations are correct I have about 1.4GB of key cache.

I have no more memory pressure nor OOM.
I don't see OOM, but I do see messages like the following in my logs:
INFO [ScheduledTasks:1] 2012-08-20 12:31:46,506 GCInspector.java (line 122) GC for ParNew: 219 ms for 1 collections, 1491982816 used; max is 1937768448
 WARN [ScheduledTasks:1] 2012-08-20 12:31:46,506 GCInspector.java (line 145) Heap is 0.7704251937535934 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically



I think that if your off-heap memory is unused, it's better enlarging the heap (with a max limit of 8GB) 

How do I know if my off-heap memory is not used?
 
Hope this will help.

Alain

2012/8/21 Tamar Fraenkel <tamar@tok-media.com>
Hi!
I have a question regarding Cassandra heap size.
Cassandra calculates heap size in cassandra-env.sh according to the following algorythm
    # set max heap size based on the following
    # max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB))
    # calculate 1/2 ram and cap to 1024MB
    # calculate 1/4 ram and cap to 8192MB
    # pick the max

So, for
system_memory_in_mb=7468
half_system_memory_in_mb=3734
quarter_system_memory_in_mb=1867
This will result in
max(min(3734,1024), min(1867,8000)) = max(1024,1867)=1867MB or in other words 1/4 of RAM.

In http://www.datastax.com/docs/1.0/operations/tuning it says: "Cassandra's default configuration opens the JVM with a heap size of 1/4 of the available system memory (or a minimum 1GB and maximum of 8GB for systems with a very low or very high amount of RAM). Heapspace should be a minimum of 1/2 of your RAM, but a maximum of 8GB. The vast majority of deployments do not benefit from larger heap sizes because (in most cases) the ability of Java 6 to gracefully handle garbage collection above 8GB quickly diminishes."
If I understand this correctly, this means it is better if my heap size will be 1/2 of RAM, 3734MB.
I am running on EC2 m1.large instance (
7.5 GB memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)).
My system seems to be suffering from lack of memory, and I should probably increase heap or (and?) reduce key cache size.

Would you recommend changing the heap to half RAM?

If yes, should I hard-code it in acassandra-env.sh?

Thanks!

Tamar Fraenkel 
Senior Software Engineer, TOK Media