Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (athena.apache.org: 209.85.214.44 is neither permitted
 nor denied by domain of oberman@civicscience.com)
MIME-Version: 1.0
In-Reply-To: <F9DCFD31-0556-4E3D-9EED-A4EF78758AFD@gmx.net>
References: <75CABA7A-F053-4084-AF9A-101114C72614@gmx.net>
 <CALdd-zhvzjzsP2qWytyc3ALt1mJDbvvN-04b9xzOy869OuszGQ@mail.gmail.com>
 <F9DCFD31-0556-4E3D-9EED-A4EF78758AFD@gmx.net>
From: William Oberman <oberman@civicscience.com>
Date: Thu, 7 Jul 2011 08:19:26 -0400
Message-ID: 
 <CAAjbL_kFasmsPT4c_OEeceOGwKtZuWed+qi1RpRCitW4CXxxJg@mail.gmail.com>
Subject: Re: Cassandra memory problem
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001636eeedcc98a9b004a779bb3d

--001636eeedcc98a9b004a779bb3d
Content-Type: text/plain; charset=ISO-8859-1

I think I had (and have) a similar problem:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/OOM-or-what-settings-to-use-on-AWS-large-td6504060.html
My memory usage grew slowly until I ran out of mem and the OS killed my
process (due to no swap).

I'm still on 0.7.4, but I'm rolling out 0.8.1 next week, which I was hoping
would fix the problem.  I'm using Centos with Sun 1.6.0_24-b07

will

On Thu, Jul 7, 2011 at 7:41 AM, Daniel Doubleday
<daniel.doubleday@gmx.net>wrote:

> Hm - had to digg deeper and it totally looks like a native mem leak to me:
>
> We are still growing with res += 100MB a day. Cassandra is > 8G now
>
> I checked the cassandra process with pmap -x
>
> Here's the human readable (aggregated) output:
>
> Format is thingy: RSS in KB
>
> Summary:
>
> Total SST: 1961616
> Anon RSS: 6499640
>
> Total RSS: 8478376
>
> Here's a little more detail:
>
> SSTables (data and index files)
> ******
> Attic: 0
> PrivateChatNotification: 38108
> Schema: 0
> PrivateChat: 161048
> UserData: 116788
> HintsColumnFamily: 0
> Rooms: 100548
> Tracker: 476
> Migrations: 0
> ObjectRepository: 793680
> BlobStore: 350924
> Activities: 400044
> LocationInfo: 0
>
> Libraries
> ******
> javajar: 2292
> nativelib: 13028
>
> Other
> ******
> 28201: 32
> jna979649866618987247.tmp: 92
> locale-archive: 1492
> [stack]: 132
> java: 44
> ffi8TsQPY(deleted): 8
>
> And
> ******
> [anon]: 6499640
>
>
> Maybe the output of pmap is totally misleading but my interpretation is
> that only 2GB of RSS is attributed to paged in sstables.
> I have one large anon block which looks like this:
>
> Address           Kbytes     RSS   Dirty Mode   Mapping
> 000000073f600000       0 3093248 3093248 rwx--    [ anon ]
>
> This is the native heap thats been allocated on startup and mlocked
>
> So theres still 3.5GB of anon memory.
>
> We haven't deployed https://issues.apache.org/jira/browse/CASSANDRA-2654 yet
> and this might be part of it but I don't think thats the main problem.
> As I said mem goes up by 100MB each day pretty linearly.
>
> Would be great if anyone could verify this by running pmap or talk my off
> the roof by explaining that nothing's the way it seems.
>
> All this might be heavily OS specific so maybe that's only on Debian?
>
> Thanks a lot
> Daniel
>
> On Jul 4, 2011, at 2:42 PM, Jonathan Ellis wrote:
>
> mmap'd data will be attributed to res, but the OS can page it out
> instead of killing the process.
>
> On Mon, Jul 4, 2011 at 5:52 AM, Daniel Doubleday
> <daniel.doubleday@gmx.net> wrote:
>
> Hi all,
>
> we have a mem problem with cassandra. res goes up without bounds (well
> until
>
> the os kills the process because we dont have swap)
>
> I found a thread that's about the same problem but on OpenJDK:
>
>
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Very-high-memory-utilization-not-caused-by-mmap-on-sstables-td5840777.html
>
> We are on Debian with Sun JDK.
>
> Resident mem is 7.4G while heap is restricted to 3G.
>
> Anyone else is seeing this with Sun JDK?
>
> Cheers,
>
> Daniel
>
> :/home/dd# java -version
>
> java version "1.6.0_24"
>
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> :/home/dd# ps aux |grep java
>
> cass     28201  9.5 46.8 372659544 7707172 ?   SLl  May24 5656:21
>
> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42
>
> -Xms3000M -Xmx3000M -Xmn400M ...
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>
>
>
> 28201 cass      20   0  355g 7.4g 1.4g S    8 46.9   5656:25 java
>
>
>
>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>
>
>

--001636eeedcc98a9b004a779bb3d
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I think I had (and have) a similar problem:<div><a href=3D"http://cassandra=
-user-incubator-apache-org.3065146.n2.nabble.com/OOM-or-what-settings-to-us=
e-on-AWS-large-td6504060.html" target=3D"_blank">http://cassandra-user-incu=
bator-apache-org.3065146.n2.nabble.com/OOM-or-what-settings-to-use-on-AWS-l=
arge-td6504060.html</a></div>


<div>My memory usage grew slowly until I ran out of mem and the OS killed m=
y process (due to no swap).</div><div><br></div><div>I&#39;m still on 0.7.4=
, but I&#39;m rolling out 0.8.1 next week, which I was hoping would fix the=
 problem. =A0I&#39;m using Centos with Sun=A01.6.0_24-b07</div>


<div><br></div><div>will<br><br><div class=3D"gmail_quote">On Thu, Jul 7, 2=
011 at 7:41 AM, Daniel Doubleday <span dir=3D"ltr">&lt;<a href=3D"mailto:da=
niel.doubleday@gmx.net" target=3D"_blank">daniel.doubleday@gmx.net</a>&gt;<=
/span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<div style=3D"word-wrap:break-word">Hm - had to digg deeper and it totally =
looks like a native mem leak to me:<div><br></div><div>We are still growing=
 with res +=3D 100MB a day. Cassandra is &gt; 8G now</div><div><br></div><d=
iv>


I checked the cassandra process with pmap -x</div><div><br></div><div>Here&=
#39;s the human readable (aggregated) output:</div><div><br></div><div>Form=
at is=A0thingy: RSS in KB</div><div><br></div><div>Summary:</div><div><br>


</div><div><div><div><div>Total SST: 1961616</div></div><div>Anon RSS:=A064=
99640</div></div></div><div><br></div><div><div>Total RSS: 8478376</div><di=
v><br></div><div><div>Here&#39;s a little more detail:</div><div><br></div>


<div>SSTables (data and index files)</div><div><div>******</div><div>Attic:=
 0</div><div><div>PrivateChatNotification: 38108</div></div><div><div>Schem=
a: 0</div></div><div><div>PrivateChat: 161048</div><div>UserData: 116788</d=
iv>


</div><div><div>HintsColumnFamily: 0</div></div><div><div>Rooms: 100548</di=
v><div>Tracker: 476</div></div><div><div>Migrations: 0</div></div><div><div=
>ObjectRepository: 793680</div><div>BlobStore: 350924</div><div>Activities:=
 400044</div>


<div>LocationInfo: 0</div></div><div><br></div><div>Libraries</div><div>***=
***</div><div>javajar: 2292</div><div>nativelib: 13028</div><div><br></div>=
<div>Other</div><div>******</div><div>28201: 32</div><div>jna97964986661898=
7247.tmp: 92</div>


<div>locale-archive: 1492</div><div>[stack]: 132</div><div>java: 44</div><d=
iv>ffi8TsQPY(deleted): 8</div><div><br></div><div>And</div><div>******</div=
><div><div>[anon]: 6499640</div></div></div></div><div><br></div><div>

<div>
<br></div></div></div><div>Maybe the output of pmap is totally misleading b=
ut my interpretation is that only 2GB of RSS is attributed to paged in ssta=
bles.</div><div>I have one large=A0anon=A0block which looks like this:</div=
>


<div><br></div><div><div>Address =A0 =A0 =A0 =A0 =A0 Kbytes =A0 =A0 RSS =A0=
 Dirty Mode =A0 Mapping</div><div>000000073f600000 =A0 =A0 =A0 0 3093248 30=
93248 rwx-- =A0 =A0[ anon ]</div></div><div><br></div><div>This is the nati=
ve heap thats been allocated on startup and mlocked</div>


<div><br></div><div>So theres still 3.5GB of anon memory.</div><div><br></d=
iv><div>We haven&#39;t deployed=A0<a href=3D"https://issues.apache.org/jira=
/browse/CASSANDRA-2654" target=3D"_blank">https://issues.apache.org/jira/br=
owse/CASSANDRA-2654</a>=A0yet and this might be part of it but I don&#39;t =
think thats the main problem.</div>


<div>As I said mem goes up by 100MB each day pretty linearly.</div><div><br=
></div><div>Would be great if anyone could verify this by running pmap or t=
alk my off the roof by explaining that nothing&#39;s the way it seems.</div=
>


<div><br></div><div>All this might be heavily OS specific so maybe that&#39=
;s only on Debian?</div><div><br></div><div>Thanks a lot</div><div>Daniel=
=A0</div><div><font color=3D"#888888"><div><br></div></font><div><div>
<div>On Jul 4, 2011, at 2:42 PM, Jonathan Ellis wrote:</div><br></div><div>=
<div></div><div><blockquote type=3D"cite"><div>mmap&#39;d data will be attr=
ibuted to res, but the OS can page it out<br>instead of killing the process=
.<br>


<br>On Mon, Jul 4, 2011 at 5:52 AM, Daniel Doubleday<br>&lt;<a href=3D"mail=
to:daniel.doubleday@gmx.net" target=3D"_blank">daniel.doubleday@gmx.net</a>=
&gt; wrote:<br><blockquote type=3D"cite">Hi all,<br></blockquote><blockquot=
e type=3D"cite">


we have a mem problem with cassandra. res goes up without bounds (well unti=
l<br></blockquote><blockquote type=3D"cite">the os kills the process becaus=
e we dont have swap)<br></blockquote><blockquote type=3D"cite">I found a th=
read that&#39;s about the same problem but on OpenJDK:<br>


</blockquote><blockquote type=3D"cite"><a href=3D"http://cassandra-user-inc=
ubator-apache-org.3065146.n2.nabble.com/Very-high-memory-utilization-not-ca=
used-by-mmap-on-sstables-td5840777.html" target=3D"_blank">http://cassandra=
-user-incubator-apache-org.3065146.n2.nabble.com/Very-high-memory-utilizati=
on-not-caused-by-mmap-on-sstables-td5840777.html</a><br>


</blockquote><blockquote type=3D"cite">We are on Debian with Sun JDK.<br></=
blockquote><blockquote type=3D"cite">Resident mem is 7.4G while heap is res=
tricted to 3G.<br></blockquote><blockquote type=3D"cite">Anyone else is see=
ing this with Sun JDK?<br>


</blockquote><blockquote type=3D"cite">Cheers,<br></blockquote><blockquote =
type=3D"cite">Daniel<br></blockquote><blockquote type=3D"cite">:/home/dd# j=
ava -version<br></blockquote><blockquote type=3D"cite">java version &quot;1=
.6.0_24&quot;<br>


</blockquote><blockquote type=3D"cite">Java(TM) SE Runtime Environment (bui=
ld 1.6.0_24-b07)<br></blockquote><blockquote type=3D"cite">Java HotSpot(TM)=
 64-Bit Server VM (build 19.1-b02, mixed mode)<br></blockquote><blockquote =
type=3D"cite">


:/home/dd# ps aux |grep java<br></blockquote><blockquote type=3D"cite">cass=
 =A0 =A0 28201 =A09.5 46.8 372659544 7707172 ? =A0 SLl =A0May24 5656:21<br>=
</blockquote><blockquote type=3D"cite">/usr/bin/java -ea -XX:+UseThreadPrio=
rities -XX:ThreadPriorityPolicy=3D42<br>


</blockquote><blockquote type=3D"cite">-Xms3000M -Xmx3000M -Xmn400M ...<br>=
</blockquote><blockquote type=3D"cite">=A0=A0PID USER =A0 =A0 =A0PR =A0NI =
=A0VIRT =A0RES =A0SHR S %CPU %MEM =A0 =A0TIME+ =A0COMMAND<br></blockquote><=
blockquote type=3D"cite"><br>


</blockquote><blockquote type=3D"cite"><br></blockquote><blockquote type=3D=
"cite">28201 cass =A0 =A0 =A020 =A0 0 =A0355g 7.4g 1.4g S =A0 =A08 46.9 =A0=
 5656:25 java<br></blockquote><blockquote type=3D"cite"><br></blockquote><b=
lockquote type=3D"cite">


<br></blockquote><blockquote type=3D"cite"><br></blockquote><br><br><br>-- =
<br>Jonathan Ellis<br>Project Chair, Apache Cassandra<br>co-founder of Data=
Stax, the source for professional Cassandra support<br><a href=3D"http://ww=
w.datastax.com" target=3D"_blank">http://www.datastax.com</a><br>


</div></blockquote></div></div></div><br></div></div></blockquote></div><br=
><br clear=3D"all"><br>

</div>

--001636eeedcc98a9b004a779bb3d--