Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Date: Fri, 26 Aug 2016 12:21:53 +0000 (UTC)
From: Ryan Svihla <rs@foundev.pro>
To: user@cassandra.apache.org
Message-ID: <7E0D86054CA325B1.369E1EC0-624A-4266-B450-256912048D88@mail.outlook.com>
In-Reply-To: <156c6404b81.cf02d4b012291.5123474829778468850@zoho.com>
References: <156c6404b81.cf02d4b012291.5123474829778468850@zoho.com>
Subject: Re: Guidelines for configuring Thresholds for Cassandra metrics
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_Part_1214_1865296321.1472214113890"
archived-at: Fri, 26 Aug 2016 12:22:09 -0000

------=_Part_1214_1865296321.1472214113890
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Thomas,
Not all metrics are KPIs and are only useful when researching a specific is=
sue or after a use case specific threshold has been set.
The main "canaries" I monitor are:* Pending compactions (dependent on the c=
ompaction strategy chosen but 1000 is a sign of severe issues in all cases)=
* dropped mutations (more than one I treat as a event to investigate, I bel=
ieve in allowing operational overhead and any evidence of load shedding sug=
gests I may not have as much as I thought)* blocked anything (flush writers=
, etc..more than one I investigate)* system hints ( More than 1k I investig=
ate)* heap usage and gc time vary a lot by use case and collector chosen, I=
 aim for below 65% usage as an average with g1, but this again varies by us=
e case a great deal. Sometimes I just looks the chart and query patterns an=
d if they don't line up I have to do other deeper investigations* read and =
write latencies exceeding SLA is also use case dependent. Those that have n=
one I tend to push towards p99 with a middle end SSD based system having 10=
0ms and a spindle based system having 600ms with CL one and assuming a "typ=
ical" query pattern (again query patterns and CL so vary here)* cell count =
and partition size vary greatly by hardware and gc tuning but I like to in =
the absence of all other relevant information like to keep cell count for a=
 partition below 100k and size below 100mb. I however have many successful =
use cases running more and I've had some fail well before that. Hardware an=
d tuning tradeoff a shift this around a lot.There is unfortunately as you'l=
l note a lot of nuance and the load out really changes what looks right (do=
wn to the model of SSDs I have different expectations for p99s if it's a mo=
del I haven't used before I'll do some comparative testing).
The reason so much of this is general and vague is my selection bias. I'm b=
rought in when people are complaining about performance or some grand syste=
mic crash because they were monitoring nothing. I have little ability to ch=
ange hardware initially so I have to be willing to allow the hardware to do=
 the best it can an establish levels where it can no longer keep up with th=
e customers goals. This may mean for some use cases 10 pending compactions =
is an actionable event for them, for another customer 100 is. The better ap=
proach is to establish a baseline for when these metrics start to indicate =
a serious issue is occurring in that particular app. Basically when people =
notice a problem, what did these numbers look like in the minutes, hours an=
d days prior? That's the way to establish the levels consistently.
Regards,
Ryan Svihla


On Fri, Aug 26, 2016 at 4:48 AM -0500, "Thomas Julian" <thomasjulian@zoho.c=
om> wrote:


Hello,

I am working on setting up a monitoring tool to monitor Cassandra Instances=
. Are there any wikis which specifies optimum value for each Cassandra KPIs=
?
For instance, I am not sure,
What value of "Memtable Columns Count" can be considered as "Normal".=C2=A0
What value of the same has to be considered as "Critical".
I knew threshold numbers for few params, for instance any thing more than z=
ero for timeouts, pending tasks=C2=A0should be considered as unusual. Also,=
 I am aware that most of the statistics' threshold numbers vary in accordan=
ce with Hardware Specification, Cassandra Environment Setup. But, what I re=
quest here is a general guideline for configuring thresholds for all the me=
trics.

If this has been already covered, please point me to that resource. If anyo=
ne on their own interest collected these things, please share.

Any help is appreciated.

Best Regards,
Julian.


------=_Part_1214_1865296321.1472214113890
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable

<html><head></head><body>Thomas,<div><br></div><div>Not all metrics are KPI=
s and are only useful when researching a specific issue or after a use case=
 specific threshold has been set.</div><div><br></div><div>The main "canari=
es" I monitor are:</div><div>* Pending compactions (dependent on the compac=
tion strategy chosen but 1000 is a sign of severe issues in all cases)</div=
><div>* dropped mutations (more than one I treat as a event to investigate,=
 I believe in allowing operational overhead and any evidence of load sheddi=
ng suggests I may not have as much as I thought)</div><div>* blocked anythi=
ng (flush writers, etc..more than one I investigate)</div><div>* system hin=
ts ( More than 1k I investigate)</div><div>* heap usage and gc time vary a =
lot by use case and collector chosen, I aim for below 65% usage as an avera=
ge with g1, but this again varies by use case a great deal. Sometimes I jus=
t looks the chart and query patterns and if they don't line up I have to do=
 other deeper investigations</div><div>* read and write latencies exceeding=
 SLA is also use case dependent. Those that have none I tend to push toward=
s p99 with a middle end SSD based system having 100ms and a spindle based s=
ystem having 600ms with CL one and assuming a "typical" query pattern (agai=
n query patterns and CL so vary here)</div><div>* cell count and partition =
size vary greatly by hardware and gc tuning but I like to in the absence of=
 all other relevant information like to keep cell count for a partition bel=
ow 100k and size below 100mb. I however have many successful use cases runn=
ing more and I've had some fail well before that. Hardware and tuning trade=
off a shift this around a lot.</div><div>There is unfortunately as you'll n=
ote a lot of nuance and the load out really changes what looks right (down =
to the model of SSDs I have different expectations for p99s if it's a model=
 I haven't used before I'll do some comparative testing).</div><div><br></d=
iv><div>The reason so much of this is general and vague is my selection bia=
s. I'm brought in when people are complaining about performance or some gra=
nd systemic crash because they were monitoring nothing. I have little abili=
ty to change hardware initially so I have to be willing to allow the hardwa=
re to do the best it can an establish levels where it can no longer keep up=
 with the customers goals. This may mean for some use cases 10 pending comp=
actions is an actionable event for them, for another customer 100 is. The b=
etter approach is to establish a baseline for when these metrics start to i=
ndicate a serious issue is occurring in that particular app. Basically when=
 people notice a problem, what did these numbers look like in the minutes, =
hours and days prior? That's the way to establish the levels consistently.<=
/div><div><br></div><div>Regards,</div><div><br></div><div>Ryan Svihla</div=
><div><br></div><div><br><div class=3D"acompli_signature"><br></div><br></d=
iv><br><br><br>
<div class=3D"gmail_quote">On Fri, Aug 26, 2016 at 4:48 AM -0500, "Thomas J=
ulian" <span dir=3D"ltr">&lt;<a href=3D"mailto:thomasjulian@zoho.com" targe=
t=3D"_blank">thomasjulian@zoho.com</a>&gt;</span> wrote:<br>
<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">


<div dir=3D"3D&quot;ltr&quot;">
<meta content=3D"text/html;charset=3DUTF-8" http-equiv=3D"Content-Type"><di=
v style=3D"font-size:10pt;font-family:Verdana,Arial,Helvetica,sans-serif;">=
<div style=3D"color: rgb(0, 0, 0); font-family: Verdana, Arial, Helvetica, =
sans-serif; font-size: 13px; font-style: normal; font-variant-ligatures: no=
rmal; font-variant-caps: normal; font-weight: normal; letter-spacing: norma=
l; line-height: normal; orphans: 2; text-align: start; text-indent: 0px; te=
xt-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -web=
kit-text-stroke-width: 0px;"><span class=3D"font" style=3D"font-family:aria=
l, helvetica, sans-serif, sans-serif"><span class=3D"size" style=3D"font-si=
ze:13px"><span class=3D"colour" style=3D"color:rgb(51, 51, 51)">Hello,</spa=
n></span></span><br></div><div style=3D"color: rgb(0, 0, 0); font-family: V=
erdana, Arial, Helvetica, sans-serif; font-size: 13px; font-style: normal; =
font-variant-ligatures: normal; font-variant-caps: normal; font-weight: nor=
mal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: s=
tart; text-indent: 0px; text-transform: none; white-space: normal; widows: =
2; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><br></div><div style=
=3D"color: rgb(0, 0, 0); font-family: Verdana, Arial, Helvetica, sans-serif=
; font-size: 13px; font-style: normal; font-variant-ligatures: normal; font=
-variant-caps: normal; font-weight: normal; letter-spacing: normal; line-he=
ight: normal; orphans: 2; text-align: start; text-indent: 0px; text-transfo=
rm: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-s=
troke-width: 0px;"><span class=3D"font" style=3D"font-family:arial, helveti=
ca, sans-serif, sans-serif"><span class=3D"size" style=3D"font-size:13px"><=
span class=3D"colour" style=3D"color:rgb(51, 51, 51)">I am working on setti=
ng up a monitoring tool to monitor Cassandra Instances. Are there any wikis=
 which specifies optimum value for each Cassandra KPIs?</span></span></span=
><br></div><div style=3D"color: rgb(0, 0, 0); font-family: Verdana, Arial, =
Helvetica, sans-serif; font-size: 13px; font-style: normal; font-variant-li=
gatures: normal; font-variant-caps: normal; font-weight: normal; letter-spa=
cing: normal; line-height: normal; orphans: 2; text-align: start; text-inde=
nt: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing=
: 0px; -webkit-text-stroke-width: 0px;"><span class=3D"font" style=3D"font-=
family:arial, helvetica, sans-serif, sans-serif"><span class=3D"size" style=
=3D"font-size:13px"><span class=3D"colour" style=3D"color:rgb(51, 51, 51)">=
For instance, I am not sure,</span></span></span><br></div><ol style=3D"col=
or: rgb(0, 0, 0); font-family: Verdana, Arial, Helvetica, sans-serif; font-=
size: 13px; font-style: normal; font-variant-ligatures: normal; font-varian=
t-caps: normal; font-weight: normal; letter-spacing: normal; line-height: n=
ormal; orphans: 2; text-align: start; text-indent: 0px; text-transform: non=
e; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-w=
idth: 0px;"><li><div><span class=3D"font" style=3D"font-family:arial, helve=
tica, sans-serif, sans-serif"><span class=3D"size" style=3D"font-size:13px"=
><span class=3D"colour" style=3D"color:rgb(51, 51, 51)">What value of "Memt=
able Columns Count" can be considered as "Normal".&nbsp;</span></span></spa=
n><br></div></li><li><div><span class=3D"font" style=3D"font-family:arial, =
helvetica, sans-serif, sans-serif"><span class=3D"size" style=3D"font-size:=
13px"><span class=3D"colour" style=3D"color:rgb(51, 51, 51)">What value of =
the same has to be considered as "Critical".</span></span></span><br></div>=
</li></ol><div style=3D"color: rgb(0, 0, 0); font-family: Verdana, Arial, H=
elvetica, sans-serif; font-size: 13px; font-style: normal; font-variant-lig=
atures: normal; font-variant-caps: normal; font-weight: normal; letter-spac=
ing: normal; line-height: normal; orphans: 2; text-align: start; text-inden=
t: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing:=
 0px; -webkit-text-stroke-width: 0px;"><span class=3D"font" style=3D"font-f=
amily:arial, helvetica, sans-serif, sans-serif"><span class=3D"size" style=
=3D"font-size:13px"><span class=3D"colour" style=3D"color:rgb(51, 51, 51)">=
I knew threshold numbers for few params, for instance any thing more than z=
ero for timeouts, pending tasks<span class=3D"Apple-converted-space">&nbsp;=
</span>should be considered as unusual. Also, I am aware that most of the s=
tatistics' threshold numbers vary in accordance with Hardware Specification=
, Cassandra Environment Setup. But, what I request here is a general guidel=
ine for configuring thresholds for all the metrics.</span></span></span><br=
></div><div style=3D"color: rgb(0, 0, 0); font-family: Verdana, Arial, Helv=
etica, sans-serif; font-size: 13px; font-style: normal; font-variant-ligatu=
res: normal; font-variant-caps: normal; font-weight: normal; letter-spacing=
: normal; line-height: normal; orphans: 2; text-align: start; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0p=
x; -webkit-text-stroke-width: 0px;"><br></div><div style=3D"color: rgb(0, 0=
, 0); font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 13px; =
font-style: normal; font-variant-ligatures: normal; font-variant-caps: norm=
al; font-weight: normal; letter-spacing: normal; line-height: normal; orpha=
ns: 2; text-align: start; text-indent: 0px; text-transform: none; white-spa=
ce: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;">=
<span class=3D"font" style=3D"font-family:arial, helvetica, sans-serif, san=
s-serif"><span class=3D"size" style=3D"font-size:13px"><span class=3D"colou=
r" style=3D"color:rgb(51, 51, 51)">If this has been already covered, please=
 point me to that resource. If anyone on their own interest collected these=
 things, please share.</span></span></span><br></div><div style=3D"color: r=
gb(0, 0, 0); font-family: Verdana, Arial, Helvetica, sans-serif; font-size:=
 13px; font-style: normal; font-variant-ligatures: normal; font-variant-cap=
s: normal; font-weight: normal; letter-spacing: normal; line-height: normal=
; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; wh=
ite-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width:=
 0px;"><br></div><div style=3D"color: rgb(0, 0, 0); font-family: Verdana, A=
rial, Helvetica, sans-serif; font-size: 13px; font-style: normal; font-vari=
ant-ligatures: normal; font-variant-caps: normal; font-weight: normal; lett=
er-spacing: normal; line-height: normal; orphans: 2; text-align: start; tex=
t-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-s=
pacing: 0px; -webkit-text-stroke-width: 0px;"><span class=3D"font" style=3D=
"font-family:arial, helvetica, sans-serif, sans-serif"><span class=3D"size"=
 style=3D"font-size:13px"><span class=3D"colour" style=3D"color:rgb(51, 51,=
 51)">Any help is appreciated.</span></span></span><br></div><div style=3D"=
color: rgb(0, 0, 0); font-family: Verdana, Arial, Helvetica, sans-serif; fo=
nt-size: 13px; font-style: normal; font-variant-ligatures: normal; font-var=
iant-caps: normal; font-weight: normal; letter-spacing: normal; line-height=
: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: =
none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-strok=
e-width: 0px;"><br></div><div style=3D"color: rgb(0, 0, 0); font-family: Ve=
rdana, Arial, Helvetica, sans-serif; font-size: 13px; font-style: normal; f=
ont-variant-ligatures: normal; font-variant-caps: normal; font-weight: norm=
al; letter-spacing: normal; line-height: normal; orphans: 2; text-align: st=
art; text-indent: 0px; text-transform: none; white-space: normal; widows: 2=
; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span class=3D"font" =
style=3D"font-family:arial, helvetica, sans-serif, sans-serif"><span class=
=3D"size" style=3D"font-size:13px"><span class=3D"colour" style=3D"color:rg=
b(51, 51, 51)">Best Regards,</span></span></span><br></div><div style=3D"co=
lor: rgb(0, 0, 0); font-family: Verdana, Arial, Helvetica, sans-serif; font=
-size: 13px; font-style: normal; font-variant-ligatures: normal; font-varia=
nt-caps: normal; font-weight: normal; letter-spacing: normal; line-height: =
normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: no=
ne; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-=
width: 0px;"><span class=3D"font" style=3D"font-family:arial, helvetica, sa=
ns-serif, sans-serif"><span class=3D"size" style=3D"font-size:13px"><span c=
lass=3D"colour" style=3D"color:rgb(51, 51, 51)">Julian.</span></span></span=
><br></div><div style=3D"color: rgb(0, 0, 0); font-family: Verdana, Arial, =
Helvetica, sans-serif; font-size: 13px; font-style: normal; font-variant-li=
gatures: normal; font-variant-caps: normal; font-weight: normal; letter-spa=
cing: normal; line-height: normal; orphans: 2; text-align: start; text-inde=
nt: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing=
: 0px; -webkit-text-stroke-width: 0px;"><br></div><div><br></div></div>
</div>

</blockquote>
</div>
</body></html>
------=_Part_1214_1865296321.1472214113890--