Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (nike.apache.org: 209.85.161.43 is neither permitted nor
 denied by domain of oberman@civicscience.com)
MIME-Version: 1.0
In-Reply-To: <4E1C4C11.8000802@gmail.com>
References: <4E1C4C11.8000802@gmail.com>
From: William Oberman <oberman@civicscience.com>
Date: Thu, 14 Jul 2011 09:11:19 -0400
Message-ID: 
 <CAAjbL_kvekiynrk_WwixU+H1r9uTGFiCfxdUruwK0-3BRXCYRA@mail.gmail.com>
Subject: Re: Survey: Cassandra/JVM Resident Set Size increase
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=bcaec5540084fbb39c04a8074551

--bcaec5540084fbb39c04a8074551
Content-Type: text/plain; charset=ISO-8859-1

I finally upgraded to 0.7.4 -> 0.8.0 (using riptano packages) 2 days ago.
Before, my resident memory (for the java process) would slowly grow without
bound and the OS would kill the process.  But, over the last 2 days, I
_think_ it's been stable.  I'll let you know in a week :-)

My other stats:
AWS large (64 bit, 7.5GB, 4 "compute units", no swap by default and I didn't
enable it manually)
Centos 5.6
Sun  1.6.0_24-b07
2 column families
4 machine cluster with RF=3
Mostly balanced write/read load (usually more writes)
Not quite "big data" volumes, large 10^6 or small 10^7 ops/day
No deletes or mutations, I only add or read

Everything else is stock, I haven't tuned anything as performance was ok.
No JVM options other than what was in the package.  No JNA.  Not sure the GC
patterns.

will

On Tue, Jul 12, 2011 at 9:28 AM, Chris Burroughs
<chris.burroughs@gmail.com>wrote:

> ### Preamble
>
> There have been several reports on the mailing list of the JVM running
> Cassandra using "too much" memory.  That is, the resident set size is
> >>(max java heap size + mmaped segments) and continues to grow until the
> process swaps, kernel oom killer comes along, or performance just
> degrades too far due to the lack of space for the page cache.  It has
> been unclear from these reports if there is a pattern.  My hope here is
> that by comparing JVM versions, OS versions, JVM configuration etc., we
> will find something.  Thank you everyone for your time.
>
>
> Some example reports:
>  - http://www.mail-archive.com/user@cassandra.apache.org/msg09279.html
>  -
>
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Very-high-memory-utilization-not-caused-by-mmap-on-sstables-td5840777.html
>  - https://issues.apache.org/jira/browse/CASSANDRA-2868
>  -
>
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/OOM-or-what-settings-to-use-on-AWS-large-td6504060.html
>  -
>
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-memory-problem-td6545642.html
>
> For reference theories include (in no particular order):
>  - memory fragmentation
>  - JVM bug
>  - OS/glibc bug
>  - direct memory
>  - swap induced fragmentation
>  - some other bad interaction of cassandra/jdk/jvm/os/nio-insanity.
>
> ### Survey
>
> 1. Do you think you are experiencing this problem?
>
> 2.  Why? (This is a good time to share a graph like
> http://www.twitpic.com/5fdabn or
> http://img24.imageshack.us/img24/1754/cassandrarss.png)
>
> 2. Are you using mmap? (If yes be sure to have read
> http://wiki.apache.org/cassandra/FAQ#mmap , and explain how you have
> used pmap [or another tool] to rule you mmap and top decieving you.)
>
> 3. Are you using JNA?  Was mlockall succesful (it's in the logs on
> startup)?
>
> 4. Is swap enabled? Are you swapping?
>
> 5. What version of Apache Cassandra are you using?
>
> 6. What is the earliest version of Apache Cassandra you recall seeing
> this problem with?
>
> 7. Have you tried the patch from CASSANDRA-2654 ?
>
> 8. What jvm and version are you using?
>
> 9. What OS and version are you using?
>
> 10. What are your jvm flags?
>
> 11. Have you tried limiting direct memory (-XX:MaxDirectMemorySize)
>
> 12. Can you characterise how much GC your cluster is doing?
>
> 13. Approximately how many read/writes per unit time is your cluster
> doing (per node or the whole cluster)?
>
> 14.  How are you column families configured (key cache size, row cache
> size, etc.)?
>
>

--bcaec5540084fbb39c04a8074551
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I finally upgraded to 0.7.4 -&gt; 0.8.0 (using riptano packages) 2 days ago=
.=A0 Before, my resident memory (for the java process) would slowly grow wi=
thout bound and the OS would kill the process.=A0 But, over the last 2 days=
, I _think_ it&#39;s been stable.=A0 I&#39;ll let you know in a week :-)<br=
>

<br>My other stats:<br>AWS large (64 bit, 7.5GB, 4 &quot;compute units&quot=
;, no swap by default and I didn&#39;t enable it manually)<br>Centos 5.6<br=
>Sun=A0 1.6.0_24-b07<br>2 column families<br>4 machine cluster with RF=3D3<=
br>

Mostly balanced write/read load (usually more writes)<br>Not quite &quot;bi=
g data&quot; volumes, large 10^6 or small 10^7 ops/day<br>No deletes or mut=
ations, I only add or read<br><br>Everything else is stock, I haven&#39;t t=
uned anything as performance was ok.=A0 No JVM options other than what was =
in the package.=A0 No JNA.=A0 Not sure the GC patterns.<br>

<br>will<br><br><div class=3D"gmail_quote">On Tue, Jul 12, 2011 at 9:28 AM,=
 Chris Burroughs <span dir=3D"ltr">&lt;<a href=3D"mailto:chris.burroughs@gm=
ail.com">chris.burroughs@gmail.com</a>&gt;</span> wrote:<br><blockquote cla=
ss=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pa=
dding-left:1ex;">

### Preamble<br>
<br>
There have been several reports on the mailing list of the JVM running<br>
Cassandra using &quot;too much&quot; memory. =A0That is, the resident set s=
ize is<br>
&gt;&gt;(max java heap size + mmaped segments) and continues to grow until =
the<br>
process swaps, kernel oom killer comes along, or performance just<br>
degrades too far due to the lack of space for the page cache. =A0It has<br>
been unclear from these reports if there is a pattern. =A0My hope here is<b=
r>
that by comparing JVM versions, OS versions, JVM configuration etc., we<br>
will find something. =A0Thank you everyone for your time.<br>
<br>
<br>
Some example reports:<br>
=A0- <a href=3D"http://www.mail-archive.com/user@cassandra.apache.org/msg09=
279.html" target=3D"_blank">http://www.mail-archive.com/user@cassandra.apac=
he.org/msg09279.html</a><br>
=A0-<br>
<a href=3D"http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com=
/Very-high-memory-utilization-not-caused-by-mmap-on-sstables-td5840777.html=
" target=3D"_blank">http://cassandra-user-incubator-apache-org.3065146.n2.n=
abble.com/Very-high-memory-utilization-not-caused-by-mmap-on-sstables-td584=
0777.html</a><br>


=A0- <a href=3D"https://issues.apache.org/jira/browse/CASSANDRA-2868" targe=
t=3D"_blank">https://issues.apache.org/jira/browse/CASSANDRA-2868</a><br>
=A0-<br>
<a href=3D"http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com=
/OOM-or-what-settings-to-use-on-AWS-large-td6504060.html" target=3D"_blank"=
>http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/OOM-or-wh=
at-settings-to-use-on-AWS-large-td6504060.html</a><br>


=A0-<br>
<a href=3D"http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com=
/Cassandra-memory-problem-td6545642.html" target=3D"_blank">http://cassandr=
a-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-memory-problem-=
td6545642.html</a><br>


<br>
For reference theories include (in no particular order):<br>
=A0- memory fragmentation<br>
=A0- JVM bug<br>
=A0- OS/glibc bug<br>
=A0- direct memory<br>
=A0- swap induced fragmentation<br>
=A0- some other bad interaction of cassandra/jdk/jvm/os/nio-insanity.<br>
<br>
### Survey<br>
<br>
1. Do you think you are experiencing this problem?<br>
<br>
2. =A0Why? (This is a good time to share a graph like<br>
<a href=3D"http://www.twitpic.com/5fdabn" target=3D"_blank">http://www.twit=
pic.com/5fdabn</a> or<br>
<a href=3D"http://img24.imageshack.us/img24/1754/cassandrarss.png" target=
=3D"_blank">http://img24.imageshack.us/img24/1754/cassandrarss.png</a>)<br>
<br>
2. Are you using mmap? (If yes be sure to have read<br>
<a href=3D"http://wiki.apache.org/cassandra/FAQ#mmap" target=3D"_blank">htt=
p://wiki.apache.org/cassandra/FAQ#mmap</a> , and explain how you have<br>
used pmap [or another tool] to rule you mmap and top decieving you.)<br>
<br>
3. Are you using JNA? =A0Was mlockall succesful (it&#39;s in the logs on st=
artup)?<br>
<br>
4. Is swap enabled? Are you swapping?<br>
<br>
5. What version of Apache Cassandra are you using?<br>
<br>
6. What is the earliest version of Apache Cassandra you recall seeing<br>
this problem with?<br>
<br>
7. Have you tried the patch from CASSANDRA-2654 ?<br>
<br>
8. What jvm and version are you using?<br>
<br>
9. What OS and version are you using?<br>
<br>
10. What are your jvm flags?<br>
<br>
11. Have you tried limiting direct memory (-XX:MaxDirectMemorySize)<br>
<br>
12. Can you characterise how much GC your cluster is doing?<br>
<br>
13. Approximately how many read/writes per unit time is your cluster<br>
doing (per node or the whole cluster)?<br>
<br>
14. =A0How are you column families configured (key cache size, row cache<br=
>
size, etc.)?<br>
<br>
</blockquote></div><br>

--bcaec5540084fbb39c04a8074551--