I was reading the Apache Cassandra 1.0 Documentation PDF dated May 10, 2012, and had some questions on what the recommended memory size is.
Below is the snippet from the PDF. Bullet 1 suggests to have 16-32GB of RAM, yet Bullet 2 suggests to limit Java heap size to no more than 8GB. My understanding is that Cassandra is implemented purely in Java, so all memory it sees and uses is the JVM Heap. So can someone help me understand the discrepancy between 16-32GB of RAM and 8GB of heap? Thanks.
The more memory a Cassandra node has, the better read performance. More RAM allows for larger cache sizes and
reduces disk I/O for reads. More RAM also allows memory tables (memtables) to hold more recently written data. Larger
memtables lead to a fewer number of SSTables being flushed to disk and fewer files to scan during a read. The ideal
amount of RAM depends on the anticipated size of your hot data.
• For dedicated hardware, a minimum of than 8GB of RAM is needed. DataStax recommends 16GB - 32GB.
• Java heap space should be set to a maximum of 8GB or half of your total RAM, whichever is lower. (A greater
heap size has more intense garbage collection periods.)
• For a virtual environment use a minimum of 4GB, such as Amazon EC2 Large instances. For production clusters
with a healthy amount of traffic, 8GB is more common.