Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of chris.burroughs@gmail.com
 designates 209.85.220.172 as permitted sender)
Message-ID: <4E1C4C11.8000802@gmail.com>
Date: Tue, 12 Jul 2011 09:28:49 -0400
From: Chris Burroughs <chris.burroughs@gmail.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
 rv:1.9.2.17) Gecko/20110424 Thunderbird/3.1.10
MIME-Version: 1.0
To: user@cassandra.apache.org
Subject: Survey: Cassandra/JVM Resident Set Size increase
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit

### Preamble

There have been several reports on the mailing list of the JVM running
Cassandra using "too much" memory.  That is, the resident set size is
>>(max java heap size + mmaped segments) and continues to grow until the
process swaps, kernel oom killer comes along, or performance just
degrades too far due to the lack of space for the page cache.  It has
been unclear from these reports if there is a pattern.  My hope here is
that by comparing JVM versions, OS versions, JVM configuration etc., we
will find something.  Thank you everyone for your time.


Some example reports:
 - http://www.mail-archive.com/user@cassandra.apache.org/msg09279.html
 -
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Very-high-memory-utilization-not-caused-by-mmap-on-sstables-td5840777.html
 - https://issues.apache.org/jira/browse/CASSANDRA-2868
 -
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/OOM-or-what-settings-to-use-on-AWS-large-td6504060.html
 -
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-memory-problem-td6545642.html

For reference theories include (in no particular order):
 - memory fragmentation
 - JVM bug
 - OS/glibc bug
 - direct memory
 - swap induced fragmentation
 - some other bad interaction of cassandra/jdk/jvm/os/nio-insanity.

### Survey

1. Do you think you are experiencing this problem?

2.  Why? (This is a good time to share a graph like
http://www.twitpic.com/5fdabn or
http://img24.imageshack.us/img24/1754/cassandrarss.png)

2. Are you using mmap? (If yes be sure to have read
http://wiki.apache.org/cassandra/FAQ#mmap , and explain how you have
used pmap [or another tool] to rule you mmap and top decieving you.)

3. Are you using JNA?  Was mlockall succesful (it's in the logs on startup)?

4. Is swap enabled? Are you swapping?

5. What version of Apache Cassandra are you using?

6. What is the earliest version of Apache Cassandra you recall seeing
this problem with?

7. Have you tried the patch from CASSANDRA-2654 ?

8. What jvm and version are you using?

9. What OS and version are you using?

10. What are your jvm flags?

11. Have you tried limiting direct memory (-XX:MaxDirectMemorySize)

12. Can you characterise how much GC your cluster is doing?

13. Approximately how many read/writes per unit time is your cluster
doing (per node or the whole cluster)?

14.  How are you column families configured (key cache size, row cache
size, etc.)?