Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
In-Reply-To: <CAKaZCX6E3gz_mBf6pwhgUW45FVXe=4=MPVKy-gab-NoLR7S9cA@mail.gmail.com>
References: <CADQ6LY=YVJbHVAKQT8qNO5LY9yRZeFsozE-3KJvxWORS13pkQg@mail.gmail.com>
 <E5F283A5-71C2-4B20-8316-460963F876D1@snazy.de> <CAKaZCX6E3gz_mBf6pwhgUW45FVXe=4=MPVKy-gab-NoLR7S9cA@mail.gmail.com>
From: Bhuvan Rawal <bhu1rawal@gmail.com>
Date: Wed, 22 Jun 2016 18:20:21 +0530
Message-ID: <CADQ6LYmmFq0xQoP_vfKGLaviN1_nFmG1XLTJd-sxxovjLXWBww@mail.gmail.com>
Subject: Re: High Heap Memory usage during nodetool repair in Cassandra 3.0.3
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001a114e5be282d4580535dd61dc
archived-at: Wed, 22 Jun 2016 12:50:31 -0000

--001a114e5be282d4580535dd61dc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Thanks for the info Paulo, Robert. I tried further testing with other
parameters and it was prevalent. We could be either 11739, 11206. But im
spektical about 11739 because repair works well in 3.5 and 11739 seems to
be fixed for 3.7/3.0.7.

We may possibly resolve this by increasing heap size thereby reducing some
page cache bandwidth before upgrading to higher versions.

On Mon, Jun 20, 2016 at 10:00 PM, Paulo Motta <pauloricardomg@gmail.com>
wrote:

> You could also be hitting CASSANDRA-11739, which was fixed on 3.0.7 and
> could potentially cause OOMs for long-running repairs.
>
>
> 2016-06-20 13:26 GMT-03:00 Robert Stupp <snazy@snazy.de>:
>
>> One possibility might be CASSANDRA-11206 (Support large partitions on th=
e
>> 3.0 sstable format), which reduces heap usage for other operations (like
>> repair, compactions) as well.
>> You can verify that by setting column_index_cache_size_in_kb in c.yaml t=
o
>> a really high value like 10000000 - if you see the same behaviour in 3.7
>> with that setting, there=E2=80=99s not much you can do except upgrading =
to 3.7 as
>> that change went into 3.6 and not into 3.0.x.
>>
>> =E2=80=94
>> Robert Stupp
>> @snazy
>>
>> On 20 Jun 2016, at 18:13, Bhuvan Rawal <bhu1rawal@gmail.com> wrote:
>>
>> Hi All,
>>
>> We are running Cassandra 3.0.3 on Production with Max Heap Size of 8GB.
>> There has been a consistent issue with nodetool repair for a while and
>> we have tried issuing it with multiple options --pr, --local as well,
>> sometimes node went down with Out of Memory error and at times nodes did
>> stopped connecting any connection, even jmx nodetool commands.
>>
>> On trying with same data on 3.7 Repair Ran successfully without
>> encountering any of the above mentioned issues. I then tried increasing
>> heap to 16GB on 3.0.3 and repair ran successfully.
>>
>> I then analyzed memory usage during nodetool repair for 3.0.3(16GB heap)
>> vs 3.7 (8GB Heap) and 3.0.3 occupied 11-14 GB at all times, whereas 3.7
>> spiked between 1-4.5 GB while repair runs. As they ran on same dataset
>> and unrepaired data with full repair.
>>
>> We would like to know if it is a known bug that was fixed post 3.0.3 and
>> there could be a possible way by which we can run repair on 3.0.3 withou=
t
>> increasing heap size as for all other activities 8GB works for us.
>>
>> PFA the visualvm snapshots.
>>
>> <Screenshot from 2016-06-20 21:06:09.png>
>> =E2=80=8B3.0.3 VisualVM Snapshot, consistent heap usage of greater than =
12 GB.
>>
>>
>> <Screenshot from 2016-06-20 21:05:57.png>
>> =E2=80=8B3.7 VisualVM Snapshot, 8GB Max Heap and max heap usage till abo=
ut 5GB.
>>
>> Thanks & Regards,
>> Bhuvan Rawal
>>
>>
>> PS: In case if the snapshots are not visible, they can be viewed from th=
e
>> following links:
>> 3.0.3:
>> https://s31.postimg.org/4e7ifsjaz/Screenshot_from_2016_06_20_21_06_09.pn=
g
>> 3.7:
>> https://s31.postimg.org/xak32s9m3/Screenshot_from_2016_06_20_21_05_57.pn=
g
>>
>>
>>
>

--001a114e5be282d4580535dd61dc
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Thanks for the info Paulo, Robert. I tried further testing=
 with other parameters and it was prevalent. We could be either 11739, 1120=
6. But im spektical about 11739 because repair works well in 3.5 and 11739 =
seems to be fixed for 3.7/<a href=3D"http://3.0.7.">3.0.7.</a>=C2=A0<div><b=
r></div><div>We may possibly resolve this by increasing heap size thereby r=
educing some page cache bandwidth before upgrading to higher versions.</div=
></div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Mon, Ju=
n 20, 2016 at 10:00 PM, Paulo Motta <span dir=3D"ltr">&lt;<a href=3D"mailto=
:pauloricardomg@gmail.com" target=3D"_blank">pauloricardomg@gmail.com</a>&g=
t;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0=
 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">You cou=
ld also be hitting CASSANDRA-11739, which was fixed on 3.0.7 and could pote=
ntially cause OOMs for long-running repairs.<div><div class=3D"h5"><br><div=
 class=3D"gmail_extra"><br><div class=3D"gmail_quote">2016-06-20 13:26 GMT-=
03:00 Robert Stupp <span dir=3D"ltr">&lt;<a href=3D"mailto:snazy@snazy.de" =
target=3D"_blank">snazy@snazy.de</a>&gt;</span>:<br><blockquote class=3D"gm=
ail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-le=
ft:1ex"><div style=3D"word-wrap:break-word">One possibility might be CASSAN=
DRA-11206 (Support large partitions on the 3.0 sstable format), which reduc=
es heap usage for other operations (like repair, compactions) as well.<div>=
You can verify that by setting=C2=A0column_index_cache_size_in_kb in c.yaml=
 to a really high value like=C2=A010000000 - if you see the same behaviour =
in 3.7 with that setting, there=E2=80=99s not much you can do except upgrad=
ing to 3.7 as that change went into 3.6 and not into 3.0.x.<br><div><br><di=
v>
<div style=3D"color:rgb(0,0,0);letter-spacing:normal;text-align:start;text-=
indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wra=
p:break-word"><div>=E2=80=94</div><div>Robert Stupp</div><div>@snazy</div><=
/div>
</div>
<br><div><blockquote type=3D"cite"><span><div>On 20 Jun 2016, at 18:13, Bhu=
van Rawal &lt;<a href=3D"mailto:bhu1rawal@gmail.com" target=3D"_blank">bhu1=
rawal@gmail.com</a>&gt; wrote:</div><br></span><div><div dir=3D"ltr"><span>=
Hi All,<div><br></div><div>We are running Cassandra 3.0.3 on Production wit=
h Max Heap Size of 8GB. There has been a consistent issue with <span style=
=3D"background-color:rgb(244,204,204)">nodetool repair</span> for a while a=
nd we have tried issuing it with multiple options --pr, --local as well, so=
metimes node went down with Out of Memory error and at times nodes did stop=
ped connecting any connection, even jmx nodetool commands.=C2=A0</div><div>=
<br></div><div>On trying with same data on 3.7 Repair Ran successfully with=
out encountering any of the above mentioned issues. I then tried increasing=
 heap to 16GB on 3.0.3 and repair ran successfully.</div><div><br></div><di=
v>I then analyzed memory usage during <span style=3D"background-color:rgb(2=
44,204,204)">nodetool repair</span> for 3.0.3(16GB heap) vs 3.7 (8GB Heap) =
and <span style=3D"background-color:rgb(162,196,201)">3.0.3 occupied 11-14 =
GB at all times,</span> whereas <span style=3D"background-color:rgb(234,153=
,153)">3.7 spiked between 1-4.5 GB while repair runs</span>. As they ran on=
 same dataset and unrepaired data with full repair.=C2=A0</div><div><br></d=
iv><div>We would like to know if it is a known bug that was fixed post 3.0.=
3 and there could be a possible way by which we can run repair on 3.0.3 wit=
hout increasing heap size as for all other activities 8GB works for us.</di=
v><div><br></div><div>PFA the visualvm snapshots.</div><div><br></div></spa=
n><div><span>&lt;Screenshot from 2016-06-20 21:06:09.png&gt;</span><span><b=
r>=E2=80=8B3.0.3 VisualVM Snapshot, consistent heap usage of greater than 1=
2 GB.<br></span></div><div><br></div><div><br></div><div><span>&lt;Screensh=
ot from 2016-06-20 21:05:57.png&gt;</span><span><br>=E2=80=8B3.7 VisualVM S=
napshot, 8GB Max Heap and max heap usage till about 5GB.</span></div><span>=
<div><br></div><div>Thanks &amp; Regards,</div><div>Bhuvan Rawal</div><div>=
<br></div><div><br></div><div>PS: In case if the snapshots are not visible,=
 they can be viewed from the following links:</div><div>3.0.3:=C2=A0<a href=
=3D"https://s31.postimg.org/4e7ifsjaz/Screenshot_from_2016_06_20_21_06_09.p=
ng" target=3D"_blank">https://s31.postimg.org/4e7ifsjaz/Screenshot_from_201=
6_06_20_21_06_09.png</a></div><div>3.7:=C2=A0<a href=3D"https://s31.postimg=
.org/xak32s9m3/Screenshot_from_2016_06_20_21_05_57.png" target=3D"_blank">h=
ttps://s31.postimg.org/xak32s9m3/Screenshot_from_2016_06_20_21_05_57.png</a=
></div></span></div>
</div></blockquote></div><br></div></div></div></blockquote></div><br></div=
></div></div></div>
</blockquote></div><br></div>

--001a114e5be282d4580535dd61dc--