Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
In-Reply-To: <D440B3B3.1894E%mtorra@demandware.com>
References: <D43F71FF.18452%mtorra@demandware.com> <CACACo5RncKBOGm-iEKZrxoZmTWVGay63-aQe6m=72KV-H31=4A@mail.gmail.com>
 <D440B3B3.1894E%mtorra@demandware.com>
From: Oleksandr Shulgin <oleksandr.shulgin@zalando.de>
Date: Thu, 3 Nov 2016 15:46:18 +0100
Message-ID: <CACACo5QRfniYuntb85ym1AWZ7XvTsCJhpX3TF6_QzLA3vqb67Q@mail.gmail.com>
Subject: Re: failing bootstraps with OOM
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001a114a893c96d6df054066a053
archived-at: Thu, 03 Nov 2016 14:46:56 -0000

--001a114a893c96d6df054066a053
Content-Type: text/plain; charset=UTF-8

On Thu, Nov 3, 2016 at 2:32 PM, Mike Torra <mtorra@demandware.com> wrote:

> Hi Alex - I do monitor sstable counts and pending compactions, but
> probably not closely enough. In 3/4 regions the cluster is running in, both
> counts are very high - ~30-40k sstables for one particular CF, and on many
> nodes >1k pending compactions.
>

It is generally a good idea to try to keep the number of pending
compactions minimal.  We usually see it is close to zero on every node
during normal operations and less than some tens during maintenance such as
repair.

I had noticed this before, but I didn't have a good sense of what a "high"
> number for these values was.
>

I would say anything higher than 20 probably requires someone to have a
look and over 1k is very troublesome.

It makes sense to me why this would cause the issues I've seen. After
> increasing concurrent_compactors and compaction_throughput_mb_per_sec (to
> 8 and 64mb, respectively), I'm starting to see those counts go down
> steadily. Hopefully that will resolve the OOM issues, but it looks like it
> will take a while for compactions to catch up.
>
> Thanks for the suggestions, Alex
>

Welcome. :-)

--
Alex

--001a114a893c96d6df054066a053
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">On T=
hu, Nov 3, 2016 at 2:32 PM, Mike Torra <span dir=3D"ltr">&lt;<a href=3D"mai=
lto:mtorra@demandware.com" target=3D"_blank">mtorra@demandware.com</a>&gt;<=
/span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8=
ex;border-left:1px #ccc solid;padding-left:1ex">


<div style=3D"word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-fam=
ily:Calibri,sans-serif">
<div>Hi Alex - I do monitor sstable counts and pending compactions, but pro=
bably not closely enough. In 3/4 regions the cluster is running in, both co=
unts are very high - ~30-40k sstables for one particular CF, and on many no=
des &gt;1k pending compactions.</div></div></blockquote><div><br></div><div=
>It is generally a good idea to try to keep the number of pending compactio=
ns minimal.=C2=A0 We usually see it is close to zero on every node during n=
ormal operations and less than some tens during maintenance such as repair.=
</div><div><br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 =
0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style=3D"word-wrap=
:break-word;color:rgb(0,0,0);font-size:14px;font-family:Calibri,sans-serif"=
><div> I had
 noticed this before, but I didn&#39;t have a good sense of what a &quot;hi=
gh&quot; number for these values was.</div></div></blockquote><div><br></di=
v><div>I would say anything higher than 20 probably requires someone to hav=
e a look and over 1k is very troublesome.</div><div><br></div><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;=
padding-left:1ex"><div style=3D"word-wrap:break-word;color:rgb(0,0,0);font-=
size:14px;font-family:Calibri,sans-serif">
<div>It makes sense to me why this would cause the issues I&#39;ve seen. Af=
ter increasing concurrent_compactors and compaction_throughput_mb_per_<wbr>=
sec (to 8 and 64mb, respectively), I&#39;m starting to see those counts go =
down steadily. Hopefully that will resolve the
 OOM issues, but it looks like it will take a while for compactions to catc=
h up.</div>
<div><br>
</div>
<div>Thanks for the suggestions, Alex</div></div></blockquote><div><br>Welc=
ome. :-)</div><div><br></div><div>--</div><div>Alex</div><div><br></div></d=
iv>
</div></div>

--001a114a893c96d6df054066a053--