Will answer as best I can, others will know more.
1) Most people seem to lean towards more memory for the JVM, around 16 to 24gb. Memory is also used by the MemTables and I assume during the compaction processes.
2) Cannot say for sure, but I assume so. Think I've seen the cache with data in it when I have only done writes.
3) I've noticed large differences between nodes when using the RP and automatic token assignments, such as the last node with very little data. Try setting tokens at start up, see http://wiki.apache.org/cassandra/Operations
3.5) Yes load balance restores things, I suggest you run it on one node at a time. Start with the node with the lowest load. Watching the progress by watching the streams via JMX or nodetool.
Hope that helps.
On 03 Aug, 2010,at 07:28 AM, Aaron Blew <email@example.com> wrote:
I've got a couple questions that have come up about how Cassandra works and what others are seeing in their environments. Here goes:
1.) What have you found to be the best ratio of Cassandra row cache to memory free on the system for filesystem cache? Are you tuning it like an RDBMS so Cassandra has the vast majority of the RAM in the system or are you letting the filesystem cache do some of the work?
2.) Is the Cassandra cache write-through (ie are new records held in the row cache as they're written to disk?
3.) When using the random partitioner how much difference should be expected (or has been observed) between nodes? 2%? 10%?
3.5) Can a load balance be expected to bring the data distribution pretty close to even among all nodes in the ring? Is the correct process for a loadbalance to run the loadbalance operation on each node in the ring?
Thanks! I'm curious to hear what other's have observed.