Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
In-Reply-To: <CADQ6LYmmFq0xQoP_vfKGLaviN1_nFmG1XLTJd-sxxovjLXWBww@mail.gmail.com>
References: <CADQ6LY=YVJbHVAKQT8qNO5LY9yRZeFsozE-3KJvxWORS13pkQg@mail.gmail.com>
 <E5F283A5-71C2-4B20-8316-460963F876D1@snazy.de> <CAKaZCX6E3gz_mBf6pwhgUW45FVXe=4=MPVKy-gab-NoLR7S9cA@mail.gmail.com>
 <CADQ6LYmmFq0xQoP_vfKGLaviN1_nFmG1XLTJd-sxxovjLXWBww@mail.gmail.com>
From: Marcus Eriksson <krummas@gmail.com>
Date: Wed, 22 Jun 2016 15:03:34 +0200
Message-ID: <CA+1dbhNqWva1xr4JVjN0Sj_pU3W0NLEduHRoJLwn94=XZ4LOzg@mail.gmail.com>
Subject: Re: High Heap Memory usage during nodetool repair in Cassandra 3.0.3
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=047d7b8744f4ac03c60535dd9061
archived-at: Wed, 22 Jun 2016 13:03:42 -0000

--047d7b8744f4ac03c60535dd9061
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

it could also be CASSANDRA-11412 if you have many sstables and vnodes

On Wed, Jun 22, 2016 at 2:50 PM, Bhuvan Rawal <bhu1rawal@gmail.com> wrote:

> Thanks for the info Paulo, Robert. I tried further testing with other
> parameters and it was prevalent. We could be either 11739, 11206. But im
> spektical about 11739 because repair works well in 3.5 and 11739 seems to
> be fixed for 3.7/3.0.7.
>
> We may possibly resolve this by increasing heap size thereby reducing som=
e
> page cache bandwidth before upgrading to higher versions.
>
> On Mon, Jun 20, 2016 at 10:00 PM, Paulo Motta <pauloricardomg@gmail.com>
> wrote:
>
>> You could also be hitting CASSANDRA-11739, which was fixed on 3.0.7 and
>> could potentially cause OOMs for long-running repairs.
>>
>>
>> 2016-06-20 13:26 GMT-03:00 Robert Stupp <snazy@snazy.de>:
>>
>>> One possibility might be CASSANDRA-11206 (Support large partitions on
>>> the 3.0 sstable format), which reduces heap usage for other operations
>>> (like repair, compactions) as well.
>>> You can verify that by setting column_index_cache_size_in_kb in c.yaml
>>> to a really high value like 10000000 - if you see the same behaviour in=
 3.7
>>> with that setting, there=E2=80=99s not much you can do except upgrading=
 to 3.7 as
>>> that change went into 3.6 and not into 3.0.x.
>>>
>>> =E2=80=94
>>> Robert Stupp
>>> @snazy
>>>
>>> On 20 Jun 2016, at 18:13, Bhuvan Rawal <bhu1rawal@gmail.com> wrote:
>>>
>>> Hi All,
>>>
>>> We are running Cassandra 3.0.3 on Production with Max Heap Size of 8GB.
>>> There has been a consistent issue with nodetool repair for a while and
>>> we have tried issuing it with multiple options --pr, --local as well,
>>> sometimes node went down with Out of Memory error and at times nodes di=
d
>>> stopped connecting any connection, even jmx nodetool commands.
>>>
>>> On trying with same data on 3.7 Repair Ran successfully without
>>> encountering any of the above mentioned issues. I then tried increasing
>>> heap to 16GB on 3.0.3 and repair ran successfully.
>>>
>>> I then analyzed memory usage during nodetool repair for 3.0.3(16GB
>>> heap) vs 3.7 (8GB Heap) and 3.0.3 occupied 11-14 GB at all times,
>>> whereas 3.7 spiked between 1-4.5 GB while repair runs. As they ran on
>>> same dataset and unrepaired data with full repair.
>>>
>>> We would like to know if it is a known bug that was fixed post 3.0.3 an=
d
>>> there could be a possible way by which we can run repair on 3.0.3 witho=
ut
>>> increasing heap size as for all other activities 8GB works for us.
>>>
>>> PFA the visualvm snapshots.
>>>
>>> <Screenshot from 2016-06-20 21:06:09.png>
>>> =E2=80=8B3.0.3 VisualVM Snapshot, consistent heap usage of greater than=
 12 GB.
>>>
>>>
>>> <Screenshot from 2016-06-20 21:05:57.png>
>>> =E2=80=8B3.7 VisualVM Snapshot, 8GB Max Heap and max heap usage till ab=
out 5GB.
>>>
>>> Thanks & Regards,
>>> Bhuvan Rawal
>>>
>>>
>>> PS: In case if the snapshots are not visible, they can be viewed from
>>> the following links:
>>> 3.0.3:
>>> https://s31.postimg.org/4e7ifsjaz/Screenshot_from_2016_06_20_21_06_09.p=
ng
>>> 3.7:
>>> https://s31.postimg.org/xak32s9m3/Screenshot_from_2016_06_20_21_05_57.p=
ng
>>>
>>>
>>>
>>
>

--047d7b8744f4ac03c60535dd9061
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">it could also be=C2=A0CASSANDRA-11412 if you have many sst=
ables and vnodes</div><div class=3D"gmail_extra"><br><div class=3D"gmail_qu=
ote">On Wed, Jun 22, 2016 at 2:50 PM, Bhuvan Rawal <span dir=3D"ltr">&lt;<a=
 href=3D"mailto:bhu1rawal@gmail.com" target=3D"_blank">bhu1rawal@gmail.com<=
/a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:=
0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">Th=
anks for the info Paulo, Robert. I tried further testing with other paramet=
ers and it was prevalent. We could be either 11739, 11206. But im spektical=
 about 11739 because repair works well in 3.5 and 11739 seems to be fixed f=
or 3.7/<a href=3D"http://3.0.7." target=3D"_blank">3.0.7.</a>=C2=A0<div><br=
></div><div>We may possibly resolve this by increasing heap size thereby re=
ducing some page cache bandwidth before upgrading to higher versions.</div>=
</div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote"><span class=
=3D"">On Mon, Jun 20, 2016 at 10:00 PM, Paulo Motta <span dir=3D"ltr">&lt;<=
a href=3D"mailto:pauloricardomg@gmail.com" target=3D"_blank">pauloricardomg=
@gmail.com</a>&gt;</span> wrote:<br></span><div><div class=3D"h5"><blockquo=
te class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc so=
lid;padding-left:1ex"><div dir=3D"ltr">You could also be hitting CASSANDRA-=
11739, which was fixed on 3.0.7 and could potentially cause OOMs for long-r=
unning repairs.<div><div><br><div class=3D"gmail_extra"><br><div class=3D"g=
mail_quote">2016-06-20 13:26 GMT-03:00 Robert Stupp <span dir=3D"ltr">&lt;<=
a href=3D"mailto:snazy@snazy.de" target=3D"_blank">snazy@snazy.de</a>&gt;</=
span>:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bord=
er-left:1px #ccc solid;padding-left:1ex"><div style=3D"word-wrap:break-word=
">One possibility might be CASSANDRA-11206 (Support large partitions on the=
 3.0 sstable format), which reduces heap usage for other operations (like r=
epair, compactions) as well.<div>You can verify that by setting=C2=A0column=
_index_cache_size_in_kb in c.yaml to a really high value like=C2=A010000000=
 - if you see the same behaviour in 3.7 with that setting, there=E2=80=99s =
not much you can do except upgrading to 3.7 as that change went into 3.6 an=
d not into 3.0.x.<br><div><br><div>
<div style=3D"color:rgb(0,0,0);letter-spacing:normal;text-align:start;text-=
indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wra=
p:break-word"><div>=E2=80=94</div><div>Robert Stupp</div><div>@snazy</div><=
/div>
</div>
<br><div><blockquote type=3D"cite"><span><div>On 20 Jun 2016, at 18:13, Bhu=
van Rawal &lt;<a href=3D"mailto:bhu1rawal@gmail.com" target=3D"_blank">bhu1=
rawal@gmail.com</a>&gt; wrote:</div><br></span><div><div dir=3D"ltr"><span>=
Hi All,<div><br></div><div>We are running Cassandra 3.0.3 on Production wit=
h Max Heap Size of 8GB. There has been a consistent issue with <span style=
=3D"background-color:rgb(244,204,204)">nodetool repair</span> for a while a=
nd we have tried issuing it with multiple options --pr, --local as well, so=
metimes node went down with Out of Memory error and at times nodes did stop=
ped connecting any connection, even jmx nodetool commands.=C2=A0</div><div>=
<br></div><div>On trying with same data on 3.7 Repair Ran successfully with=
out encountering any of the above mentioned issues. I then tried increasing=
 heap to 16GB on 3.0.3 and repair ran successfully.</div><div><br></div><di=
v>I then analyzed memory usage during <span style=3D"background-color:rgb(2=
44,204,204)">nodetool repair</span> for 3.0.3(16GB heap) vs 3.7 (8GB Heap) =
and <span style=3D"background-color:rgb(162,196,201)">3.0.3 occupied 11-14 =
GB at all times,</span> whereas <span style=3D"background-color:rgb(234,153=
,153)">3.7 spiked between 1-4.5 GB while repair runs</span>. As they ran on=
 same dataset and unrepaired data with full repair.=C2=A0</div><div><br></d=
iv><div>We would like to know if it is a known bug that was fixed post 3.0.=
3 and there could be a possible way by which we can run repair on 3.0.3 wit=
hout increasing heap size as for all other activities 8GB works for us.</di=
v><div><br></div><div>PFA the visualvm snapshots.</div><div><br></div></spa=
n><div><span>&lt;Screenshot from 2016-06-20 21:06:09.png&gt;</span><span><b=
r>=E2=80=8B3.0.3 VisualVM Snapshot, consistent heap usage of greater than 1=
2 GB.<br></span></div><div><br></div><div><br></div><div><span>&lt;Screensh=
ot from 2016-06-20 21:05:57.png&gt;</span><span><br>=E2=80=8B3.7 VisualVM S=
napshot, 8GB Max Heap and max heap usage till about 5GB.</span></div><span>=
<div><br></div><div>Thanks &amp; Regards,</div><div>Bhuvan Rawal</div><div>=
<br></div><div><br></div><div>PS: In case if the snapshots are not visible,=
 they can be viewed from the following links:</div><div>3.0.3:=C2=A0<a href=
=3D"https://s31.postimg.org/4e7ifsjaz/Screenshot_from_2016_06_20_21_06_09.p=
ng" target=3D"_blank">https://s31.postimg.org/4e7ifsjaz/Screenshot_from_201=
6_06_20_21_06_09.png</a></div><div>3.7:=C2=A0<a href=3D"https://s31.postimg=
.org/xak32s9m3/Screenshot_from_2016_06_20_21_05_57.png" target=3D"_blank">h=
ttps://s31.postimg.org/xak32s9m3/Screenshot_from_2016_06_20_21_05_57.png</a=
></div></span></div>
</div></blockquote></div><br></div></div></div></blockquote></div><br></div=
></div></div></div>
</blockquote></div></div></div><br></div>
</blockquote></div><br></div>

--047d7b8744f4ac03c60535dd9061--