I *think* people lean towards more JVM than file cache. Often people email about the JVM running Out Of Memory, so  give it more and see how much it's using in your case. Your nodes will gave a minimum requirement for memory based on the Memtable Thresholds, cache settings and the usage patters. It's not like something, say, MS SQL Server when you can tell it to exist in a certain amount of memory.

http://wiki.apache.org/cassandra/MemtableThresholds may give some background.  

Sorry it's not very clear, perhaps someone else can give a better answer

Aaron

On 03 Aug, 2010,at 12:11 PM, Aaron Blew <aaronblew@gmail.com> wrote:

1.) 16 to 24GB out of how much total system memory?  Is this 50% of available system RAM or 90%?

Thanks for the reply!
-Aaron


On Mon, Aug 2, 2010 at 2:24 PM, Aaron Morton <aaron@thelastpickle.com> wrote:
Will answer as best I can, others will know more.

1) Most people seem to lean towards more memory for the JVM, around 16 to 24gb. Memory is also used by the MemTables and I assume during the compaction processes.

2) Cannot say for sure, but I assume so. Think I've seen the cache with data in it when I have only done writes.

3) I've noticed large differences between nodes when using the RP and automatic token assignments, such as the last node with very little data. Try setting tokens at start up, see http://wiki.apache.org/cassandra/Operations

3.5) Yes load balance restores things, I suggest you run it on one node at a time. Start with the node with the lowest load. Watching the progress by watching the streams via JMX or nodetool.

Hope that helps.
Aaron




On 03 Aug, 2010,at 07:28 AM, Aaron Blew <aaronblew@gmail.com> wrote:

Hi All,
I've got a couple questions that have come up about how Cassandra works and what others are seeing in their environments.  Here goes:

1.) What have you found to be the best ratio of Cassandra row cache to memory free on the system for filesystem cache?  Are you tuning it like an RDBMS so Cassandra has the vast majority of the RAM in the system or are you letting the filesystem cache do some of the work?

2.) Is the Cassandra cache write-through (ie are new records held in the row cache as they're written to disk?

3.) When using the random partitioner how much difference should be expected (or has been observed) between nodes?  2%? 10%?

3.5) Can a load balance be expected to bring the data distribution pretty close to even among all nodes in the ring?  Is the correct process for a loadbalance to run the loadbalance operation on each node in the ring?


Thanks!  I'm curious to hear what other's have observed.
-Aaron