Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CABwtvz6SLU8YNYAGY08Ob5zK0DmpcVbw67CJDN6wjc8OpH40yw@mail.gmail.com>
References: 
 <CABwtvz6bBu9yxt7kzMNqchGcJ-wT7E79vxDr-eMkquztYhtbQQ@mail.gmail.com>
	<0CAA1D64-4C66-429D-AF72-EB88904AACEF@crowdstrike.com>
	<BLUPR08MB1810371F0A8219F27B1201439BE90@BLUPR08MB1810.namprd08.prod.outlook.com>
	<CABwtvz6SLU8YNYAGY08Ob5zK0DmpcVbw67CJDN6wjc8OpH40yw@mail.gmail.com>
Date: Tue, 15 Dec 2015 19:41:14 -0500
Message-ID: 
 <CAOxAL60Q2f5Kn7n-=fFZkwsZe_MCdw0snTSeMHvw=GbPDtDW3Q@mail.gmail.com>
Subject: Re: Unable to start one Cassandra node: OutOfMemoryError
From: Jack Krupansky <jack.krupansky@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001a114271acd84f8f0526f929b3

--001a114271acd84f8f0526f929b3
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Can a core Cassandra committer verify if removing the compactions_in_progre=
ss
folder is indeed to desired and recommended solution to this problem, or
whether it might in fact be a bug that this workaround is needed at all?
Thanks!

-- Jack Krupansky

On Thu, Dec 10, 2015 at 5:34 PM, Mikhail Strebkov <strebkov@gmail.com>
wrote:

> Steve, thanks a ton! Removing compactions_in_progress helped! Now the nod=
e
> is running again.
>
> p.s. Sorry for referring to you by the last name in my last email, I got
> confused.
>
> On Thu, Dec 10, 2015 at 2:09 AM, Walsh, Stephen <Stephen.Walsh@aspect.com=
>
> wrote:
>
>> 8GB is the max recommended for heap size and that=E2=80=99s if you have =
32GB or
>> more available.
>>
>>
>>
>> We use 6GB on our 16GB machines and its very stable
>>
>>
>>
>> The out of memory could be coming from cassandra reloading
>> compactions_in_progress into memory, you can check this from the log fil=
es
>> if needs be.
>>
>> You can safely delete this folder inside the data directory.
>>
>>
>>
>> This can happen if you didn=E2=80=99t stop cassandra with a drain comman=
d and
>> wait for the compactions to finish.
>>
>> Last time we hit it =E2=80=93 was due to testing HA when we forced kille=
d an
>> entire cluster.
>>
>>
>>
>> Steve
>>
>>
>>
>>
>>
>>
>>
>> *From:* Jeff Jirsa [mailto:jeff.jirsa@crowdstrike.com]
>> *Sent:* 10 December 2015 02:49
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Unable to start one Cassandra node: OutOfMemoryError
>>
>>
>>
>> 8G is probably too small for a G1 heap. Raise your heap or try CMS
>> instead.
>>
>>
>>
>> 71% of your heap is collections =E2=80=93 may be a weird data model quir=
k, but
>> try CMS first and see if that behaves better.
>>
>>
>>
>>
>>
>>
>>
>> *From: *Mikhail Strebkov
>> *Reply-To: *"user@cassandra.apache.org"
>> *Date: *Wednesday, December 9, 2015 at 5:26 PM
>> *To: *"user@cassandra.apache.org"
>> *Subject: *Unable to start one Cassandra node: OutOfMemoryError
>>
>>
>>
>> Hi everyone,
>>
>>
>>
>> While upgrading our 5 machines cluster from DSE version 4.7.1 (Cassandra
>> 2.1.8) to DSE version: 4.8.2 (Cassandra 2.1.11)  one of the nodes can't
>> start with OutOfMemoryError.
>>
>> We're using HotSpot 64-Bit Server VM/1.8.0_45 and G1 garbage collector
>> with 8 GiB heap.
>>
>> Average node size is 300 GiB.
>>
>>
>>
>> I looked at the heap dump with YourKit profiler (www.yourkit.com) and it
>> was quite hard since it's so big, but can't get much out of it:
>> http://i.imgur.com/fIRImma.png
>>
>>
>>
>> As far as I understand the report, there are 1,332,812 instances of
>> org.apache.cassandra.db.Row which retain 8 GiB. I don't understand why a=
ll
>> of them are still strongly reachable?
>>
>>
>>
>> Please help me to debug this. I don't know even where to start.
>>
>> I feel very uncomfortable with 1 node running 4.8.2, 1 node down and 3
>> nodes running 4.7.1 at the same time.
>>
>>
>>
>> Thanks,
>>
>> Mikhail
>>
>>
>>
>>
>> This email (including any attachments) is proprietary to Aspect Software=
,
>> Inc. and may contain information that is confidential. If you have recei=
ved
>> this message in error, please do not read, copy or forward this message.
>> Please notify the sender immediately, delete it from your system and
>> destroy any copies. You may not further disclose or distribute this emai=
l
>> or its attachments.
>>
>
>

--001a114271acd84f8f0526f929b3
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Can a core Cassandra committer verify if removing the=C2=
=A0<span style=3D"font-size:12.8px">compactions_in_progress folder is indee=
d to desired and recommended solution to this problem, or whether it might =
in fact be a bug that this workaround is needed at all? Thanks!</span></div=
><div class=3D"gmail_extra"><br clear=3D"all"><div><div class=3D"gmail_sign=
ature"><div dir=3D"ltr">-- Jack Krupansky</div></div></div>
<br><div class=3D"gmail_quote">On Thu, Dec 10, 2015 at 5:34 PM, Mikhail Str=
ebkov <span dir=3D"ltr">&lt;<a href=3D"mailto:strebkov@gmail.com" target=3D=
"_blank">strebkov@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"g=
mail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-l=
eft:1ex"><div dir=3D"ltr">Steve,=C2=A0thanks a ton! Removing compactions_in=
_progress helped! Now the node is running again.<br><div><br></div><div>p.s=
. Sorry for referring to you by the last name in my last email, I got confu=
sed.</div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote"><span c=
lass=3D"">On Thu, Dec 10, 2015 at 2:09 AM, Walsh, Stephen <span dir=3D"ltr"=
>&lt;<a href=3D"mailto:Stephen.Walsh@aspect.com" target=3D"_blank">Stephen.=
Walsh@aspect.com</a>&gt;</span> wrote:<br></span><div><div class=3D"h5"><bl=
ockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #=
ccc solid;padding-left:1ex">


<div lang=3D"EN-IE" link=3D"blue" vlink=3D"purple">
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:#1f497d">8GB is the max recommended for heap s=
ize and that=E2=80=99s if you have 32GB or more available.<u></u><u></u></s=
pan></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:#1f497d"><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:#1f497d">We use 6GB on our 16GB machines and i=
ts very stable<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:#1f497d"><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:#1f497d">The out of memory could be coming fro=
m cassandra reloading compactions_in_progress into memory, you can check th=
is from the log files
 if needs be.<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:#1f497d">You can safely delete this folder ins=
ide the data directory.<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:#1f497d"><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:#1f497d">This can happen if you didn=E2=80=99t=
 stop cassandra with a drain command and wait for the compactions to finish=
.<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:#1f497d">Last time we hit it =E2=80=93 was due=
 to testing HA when we forced killed an entire cluster.<u></u><u></u></span=
></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:#1f497d"><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:#1f497d">Steve<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:#1f497d"><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:#1f497d"><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:#1f497d"><u></u>=C2=A0<u></u></span></p>
<div>
<div style=3D"border:none;border-top:solid #e1e1e1 1.0pt;padding:3.0pt 0cm =
0cm 0cm">
<p class=3D"MsoNormal"><b><span lang=3D"EN-US" style=3D"font-size:11.0pt;fo=
nt-family:&quot;Calibri&quot;,sans-serif">From:</span></b><span lang=3D"EN-=
US" style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif"> =
Jeff Jirsa [mailto:<a href=3D"mailto:jeff.jirsa@crowdstrike.com" target=3D"=
_blank">jeff.jirsa@crowdstrike.com</a>]
<br>
<b>Sent:</b> 10 December 2015 02:49<br>
<b>To:</b> <a href=3D"mailto:user@cassandra.apache.org" target=3D"_blank">u=
ser@cassandra.apache.org</a><br>
<b>Subject:</b> Re: Unable to start one Cassandra node: OutOfMemoryError<u>=
</u><u></u></span></p>
</div>
</div><div><div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:10.5pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:black">8G is probably too small for a G1 heap.=
 Raise your heap or try CMS instead.<u></u><u></u></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:10.5pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:black"><u></u>=C2=A0<u></u></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:10.5pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:black">71% of your heap is collections =E2=80=
=93 may be a weird data model quirk, but try CMS first and see if that beha=
ves better.<u></u><u></u></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:10.5pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:black"><u></u>=C2=A0<u></u></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:10.5pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:black"><u></u>=C2=A0<u></u></span></p>
</div>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:10.5pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:black"><u></u>=C2=A0<u></u></span></p>
</div>
<div style=3D"border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0cm =
0cm 0cm">
<p class=3D"MsoNormal"><b><span style=3D"font-family:&quot;Calibri&quot;,sa=
ns-serif;color:black">From:
</span></b><span style=3D"font-family:&quot;Calibri&quot;,sans-serif;color:=
black">Mikhail Strebkov<br>
<b>Reply-To: </b>&quot;<a href=3D"mailto:user@cassandra.apache.org" target=
=3D"_blank">user@cassandra.apache.org</a>&quot;<br>
<b>Date: </b>Wednesday, December 9, 2015 at 5:26 PM<br>
<b>To: </b>&quot;<a href=3D"mailto:user@cassandra.apache.org" target=3D"_bl=
ank">user@cassandra.apache.org</a>&quot;<br>
<b>Subject: </b>Unable to start one Cassandra node: OutOfMemoryError<u></u>=
<u></u></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:10.5pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:black"><u></u>=C2=A0<u></u></span></p>
</div>
<div>
<div>
<div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:10.5pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:black">Hi everyone,<u></u><u></u></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:10.5pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:black"><u></u>=C2=A0<u></u></span></p>
</div>
<p class=3D"MsoNormal"><span style=3D"font-size:10.5pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:black">While upgrading our 5 machines cluster =
from DSE version 4.7.1 (Cassandra 2.1.8) to DSE version: 4.8.2 (Cassandra 2=
.1.11) =C2=A0one of the nodes can&#39;t start with OutOfMemoryError.<br>
<br>
We&#39;re using HotSpot 64-Bit Server VM/1.8.0_45 and G1 garbage collector =
with 8 GiB heap.<br>
<br>
Average node size is 300 GiB.<u></u><u></u></span></p>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:10.5pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:black"><u></u>=C2=A0<u></u></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:10.5pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:black">I looked at the heap dump with YourKit =
profiler (<a href=3D"http://www.yourkit.com" target=3D"_blank">www.yourkit.=
com</a>) and it was quite hard since it&#39;s so big, but can&#39;t get muc=
h
 out of it:=C2=A0<a href=3D"http://i.imgur.com/fIRImma.png" target=3D"_blan=
k">http://i.imgur.com/fIRImma.png</a><u></u><u></u></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:10.5pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:black"><u></u>=C2=A0<u></u></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:10.5pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:black">As far as I understand the report, ther=
e are 1,332,812 instances of org.apache.cassandra.db.Row which retain 8 GiB=
. I don&#39;t understand why all of them are still
 strongly reachable?<u></u><u></u></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:10.5pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:black"><u></u>=C2=A0<u></u></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:10.5pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:black">Please=C2=A0help me to debug this. I do=
n&#39;t know even where to start.<u></u><u></u></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:10.5pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:black">I feel very uncomfortable with 1 node r=
unning 4.8.2, 1 node down and 3 nodes running 4.7.1 at the same time.<u></u=
><u></u></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:10.5pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:black"><u></u>=C2=A0<u></u></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:10.5pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:black">Thanks,<u></u><u></u></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:10.5pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:black">Mikhail<u></u><u></u></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:10.5pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:black"><u></u>=C2=A0<u></u></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:10.5pt;font-family:&quot;Ca=
libri&quot;,sans-serif;color:black"><u></u>=C2=A0<u></u></span></p>
</div>
</div>
</div>
</div>
</div></div></div>
This email (including any attachments) is proprietary to Aspect Software, I=
nc. and may contain information that is confidential. If you have received =
this message in error, please do not read, copy or forward this message. Pl=
ease notify the sender immediately,
 delete it from your system and destroy any copies. You may not further dis=
close or distribute this email or its attachments.
</div>

</blockquote></div></div></div><br></div></div>
</blockquote></div><br></div>

--001a114271acd84f8f0526f929b3--