Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C1A5578BD for ; Thu, 14 Jul 2011 13:12:10 +0000 (UTC) Received: (qmail 39622 invoked by uid 500); 14 Jul 2011 13:12:08 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 39485 invoked by uid 500); 14 Jul 2011 13:12:07 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 39477 invoked by uid 99); 14 Jul 2011 13:12:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Jul 2011 13:12:07 +0000 X-ASF-Spam-Status: No, hits=3.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: 209.85.161.43 is neither permitted nor denied by domain of oberman@civicscience.com) Received: from [209.85.161.43] (HELO mail-fx0-f43.google.com) (209.85.161.43) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Jul 2011 13:12:00 +0000 Received: by fxg17 with SMTP id 17so901865fxg.30 for ; Thu, 14 Jul 2011 06:11:39 -0700 (PDT) Received: by 10.204.37.206 with SMTP id y14mr827523bkd.57.1310649099203; Thu, 14 Jul 2011 06:11:39 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.36.14 with HTTP; Thu, 14 Jul 2011 06:11:19 -0700 (PDT) X-Originating-IP: [24.23.118.38] In-Reply-To: <4E1C4C11.8000802@gmail.com> References: <4E1C4C11.8000802@gmail.com> From: William Oberman Date: Thu, 14 Jul 2011 09:11:19 -0400 Message-ID: Subject: Re: Survey: Cassandra/JVM Resident Set Size increase To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=bcaec5540084fbb39c04a8074551 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec5540084fbb39c04a8074551 Content-Type: text/plain; charset=ISO-8859-1 I finally upgraded to 0.7.4 -> 0.8.0 (using riptano packages) 2 days ago. Before, my resident memory (for the java process) would slowly grow without bound and the OS would kill the process. But, over the last 2 days, I _think_ it's been stable. I'll let you know in a week :-) My other stats: AWS large (64 bit, 7.5GB, 4 "compute units", no swap by default and I didn't enable it manually) Centos 5.6 Sun 1.6.0_24-b07 2 column families 4 machine cluster with RF=3 Mostly balanced write/read load (usually more writes) Not quite "big data" volumes, large 10^6 or small 10^7 ops/day No deletes or mutations, I only add or read Everything else is stock, I haven't tuned anything as performance was ok. No JVM options other than what was in the package. No JNA. Not sure the GC patterns. will On Tue, Jul 12, 2011 at 9:28 AM, Chris Burroughs wrote: > ### Preamble > > There have been several reports on the mailing list of the JVM running > Cassandra using "too much" memory. That is, the resident set size is > >>(max java heap size + mmaped segments) and continues to grow until the > process swaps, kernel oom killer comes along, or performance just > degrades too far due to the lack of space for the page cache. It has > been unclear from these reports if there is a pattern. My hope here is > that by comparing JVM versions, OS versions, JVM configuration etc., we > will find something. Thank you everyone for your time. > > > Some example reports: > - http://www.mail-archive.com/user@cassandra.apache.org/msg09279.html > - > > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Very-high-memory-utilization-not-caused-by-mmap-on-sstables-td5840777.html > - https://issues.apache.org/jira/browse/CASSANDRA-2868 > - > > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/OOM-or-what-settings-to-use-on-AWS-large-td6504060.html > - > > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-memory-problem-td6545642.html > > For reference theories include (in no particular order): > - memory fragmentation > - JVM bug > - OS/glibc bug > - direct memory > - swap induced fragmentation > - some other bad interaction of cassandra/jdk/jvm/os/nio-insanity. > > ### Survey > > 1. Do you think you are experiencing this problem? > > 2. Why? (This is a good time to share a graph like > http://www.twitpic.com/5fdabn or > http://img24.imageshack.us/img24/1754/cassandrarss.png) > > 2. Are you using mmap? (If yes be sure to have read > http://wiki.apache.org/cassandra/FAQ#mmap , and explain how you have > used pmap [or another tool] to rule you mmap and top decieving you.) > > 3. Are you using JNA? Was mlockall succesful (it's in the logs on > startup)? > > 4. Is swap enabled? Are you swapping? > > 5. What version of Apache Cassandra are you using? > > 6. What is the earliest version of Apache Cassandra you recall seeing > this problem with? > > 7. Have you tried the patch from CASSANDRA-2654 ? > > 8. What jvm and version are you using? > > 9. What OS and version are you using? > > 10. What are your jvm flags? > > 11. Have you tried limiting direct memory (-XX:MaxDirectMemorySize) > > 12. Can you characterise how much GC your cluster is doing? > > 13. Approximately how many read/writes per unit time is your cluster > doing (per node or the whole cluster)? > > 14. How are you column families configured (key cache size, row cache > size, etc.)? > > --bcaec5540084fbb39c04a8074551 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I finally upgraded to 0.7.4 -> 0.8.0 (using riptano packages) 2 days ago= .=A0 Before, my resident memory (for the java process) would slowly grow wi= thout bound and the OS would kill the process.=A0 But, over the last 2 days= , I _think_ it's been stable.=A0 I'll let you know in a week :-)
My other stats:
AWS large (64 bit, 7.5GB, 4 "compute units"= ;, no swap by default and I didn't enable it manually)
Centos 5.6Sun=A0 1.6.0_24-b07
2 column families
4 machine cluster with RF=3D3<= br> Mostly balanced write/read load (usually more writes)
Not quite "bi= g data" volumes, large 10^6 or small 10^7 ops/day
No deletes or mut= ations, I only add or read

Everything else is stock, I haven't t= uned anything as performance was ok.=A0 No JVM options other than what was = in the package.=A0 No JNA.=A0 Not sure the GC patterns.

will

On Tue, Jul 12, 2011 at 9:28 AM,= Chris Burroughs <chris.burroughs@gmail.com> wrote:
### Preamble

There have been several reports on the mailing list of the JVM running
Cassandra using "too much" memory. =A0That is, the resident set s= ize is
>>(max java heap size + mmaped segments) and continues to grow until = the
process swaps, kernel oom killer comes along, or performance just
degrades too far due to the lack of space for the page cache. =A0It has
been unclear from these reports if there is a pattern. =A0My hope here is that by comparing JVM versions, OS versions, JVM configuration etc., we
will find something. =A0Thank you everyone for your time.


Some example reports:
=A0- http://www.mail-archive.com/user@cassandra.apac= he.org/msg09279.html
=A0-
http://cassandra-user-incubator-apache-org.3065146.n2.n= abble.com/Very-high-memory-utilization-not-caused-by-mmap-on-sstables-td584= 0777.html
=A0- https://issues.apache.org/jira/browse/CASSANDRA-2868
=A0-
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/OOM-or-wh= at-settings-to-use-on-AWS-large-td6504060.html
=A0-
http://cassandr= a-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-memory-problem-= td6545642.html

For reference theories include (in no particular order):
=A0- memory fragmentation
=A0- JVM bug
=A0- OS/glibc bug
=A0- direct memory
=A0- swap induced fragmentation
=A0- some other bad interaction of cassandra/jdk/jvm/os/nio-insanity.

### Survey

1. Do you think you are experiencing this problem?

2. =A0Why? (This is a good time to share a graph like
http://www.twit= pic.com/5fdabn or
http://img24.imageshack.us/img24/1754/cassandrarss.png)

2. Are you using mmap? (If yes be sure to have read
htt= p://wiki.apache.org/cassandra/FAQ#mmap , and explain how you have
used pmap [or another tool] to rule you mmap and top decieving you.)

3. Are you using JNA? =A0Was mlockall succesful (it's in the logs on st= artup)?

4. Is swap enabled? Are you swapping?

5. What version of Apache Cassandra are you using?

6. What is the earliest version of Apache Cassandra you recall seeing
this problem with?

7. Have you tried the patch from CASSANDRA-2654 ?

8. What jvm and version are you using?

9. What OS and version are you using?

10. What are your jvm flags?

11. Have you tried limiting direct memory (-XX:MaxDirectMemorySize)

12. Can you characterise how much GC your cluster is doing?

13. Approximately how many read/writes per unit time is your cluster
doing (per node or the whole cluster)?

14. =A0How are you column families configured (key cache size, row cache size, etc.)?


--bcaec5540084fbb39c04a8074551--