Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of kwright@nanigans.com designates
 216.82.243.203 as permitted sender)
From: Keith Wright <kwright@nanigans.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Thu, 24 Jul 2014 14:22:55 -0500
Subject: Re: Hot, large row
Thread-Topic: Hot, large row
Thread-Index: Ac+ndJ5/3Wa5s7FJT8qzRSb8c9j3VA==
Message-ID: <CFF6D4A8.33759%kwright@nanigans.com>
In-Reply-To: 
 <CABNXB2CtuLF0FM=6tGbECdOrVRCanPNQpDB7g-M=5GO-8-MSnA@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
user-agent: Microsoft-MacOutlook/14.2.3.120616
acceptlanguage: en-US
Content-Type: multipart/alternative;
	boundary="_000_CFF6D4A833759kwrightnaniganscom_"
MIME-Version: 1.0

--_000_CFF6D4A833759kwrightnaniganscom_
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

I can see from cfhistograms that I do have some wide rows (see below).  I s=
et trace probability as you suggested but the output doesn=92t appear to te=
ll me what row was actually read unless I missed something.  I just see exe=
cuting prepared statement.   Any ideas how I can find the row in question?

I am considering reducing read_request_timeout_in_ms: 5000 in cassandra.yam=
l so that it reduces the impact when this occurs.

Any help in identifying my issue would be GREATLY appreciated


Cell Count per Partition

    1 cells: 50449950

    2 cells: 14281828

    3 cells: 8093366

    4 cells: 5029200

    5 cells: 3103023

    6 cells: 3059903

    7 cells: 1903018

    8 cells: 1509297

   10 cells: 2420359

   12 cells: 1624895

   14 cells: 1171678

   17 cells: 1289391

   20 cells: 909777

   24 cells: 852081

   29 cells: 722925

   35 cells: 587067

   42 cells: 459473

   50 cells: 358744

   60 cells: 304146

   72 cells: 244682

   86 cells: 191045

  103 cells: 155337

  124 cells: 127061

  149 cells: 98913

  179 cells: 77454

  215 cells: 59849

  258 cells: 46117

  310 cells: 35321

  372 cells: 26319

  446 cells: 19379

  535 cells: 13783

  642 cells: 9993

  770 cells: 6973

  924 cells: 4713

 1109 cells: 3229

 1331 cells: 2062

 1597 cells: 1338

 1916 cells: 773

 2299 cells: 495

 2759 cells: 268

 3311 cells: 150

 3973 cells: 100

 4768 cells: 42

 5722 cells: 24

 6866 cells: 12

 8239 cells: 9

 9887 cells: 3

11864 cells: 0

14237 cells: 5

17084 cells: 1

20501 cells: 0

24601 cells: 2

29521 cells: 0

35425 cells: 0

42510 cells: 0

51012 cells: 0

61214 cells: 2

From: DuyHai Doan <doanduyhai@gmail.com<mailto:doanduyhai@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <us=
er@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Thursday, July 24, 2014 at 3:01 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cas=
sandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Hot, large row

"How can I detect wide rows?" -->

nodetool cfhistograms <keyspace> <suspected column family>

Look at column "Column count" (last column) and identify a line in this col=
umn with very high value of "Offset". In a well designed application you sh=
ould have a gaussian distribution where 80% of your row have a similar numb=
er of columns.

"Anyone know what debug level I can set so that I can see what reads the ho=
t node is handling?  " -->

"nodetool settraceprobability <value>",  where value is a small number (0.0=
01) on the node where you encounter the issue. Activate the tracing for a w=
hile (5 mins) and deactivate it (value =3D 0). Then look into system_traces=
 tables "events" & "sessions". It may help or not since the tracing is done=
 once every 1000.

"Any way to get the server to blacklist these wide rows automatically?" -->=
 No


On Thu, Jul 24, 2014 at 8:48 PM, Keith Wright <kwright@nanigans.com<mailto:=
kwright@nanigans.com>> wrote:
Hi all,

   We are seeing an issue where basically daily one of our nodes spikes in =
load and is churning in CMS heap pressure.  It appears that reads are backi=
ng up and my guess is that our application is reading a large row repeatedl=
y.  Our write structure can lead itself to wide rows very infrequently (<0.=
001%) and we do our best to detect and delete them but obviously we=92re mi=
ssing a case.  Hoping for assistance on the following questions:

 *   How can I detect wide rows?
 *   Anyone know what debug level I can set so that I can see what reads th=
e hot node is handling?  I=92m hoping to see the =93bad=94 row
 *   Any way to get the server to blacklist these wide rows automatically?

We=92re using C* 2.0.6 with Vnodes.

Thanks


--_000_CFF6D4A833759kwrightnaniganscom_
Content-Type: text/html; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

<html><head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DWindows-1=
252"></head><body style=3D"word-wrap: break-word; -webkit-nbsp-mode: space;=
 -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14p=
x; font-family: Calibri, sans-serif;"><div>I can see from cfhistograms that=
 I do have some wide rows (see below). &nbsp;I set trace probability as you=
 suggested but the output doesn=92t appear to tell me what row was actually=
 read unless I missed something. &nbsp;I just see executing prepared statem=
ent. &nbsp; Any ideas how I can find the row in question?</div><div><br></d=
iv><div>I am considering reducing&nbsp;read_request_timeout_in_ms: 5000 in =
cassandra.yaml so that it reduces the impact when this occurs.</div><div><b=
r></div><div>Any help in identifying my issue would be GREATLY appreciated<=
/div><div><br></div><div><p style=3D"margin: 0px; font-size: 11px; font-fam=
ily: Menlo;">Cell Count per Partition</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp; &nbsp=
; 1 cells: 50449950</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp; &nbsp=
; 2 cells: 14281828</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp; &nbsp=
; 3 cells: 8093366</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp; &nbsp=
; 4 cells: 5029200</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp; &nbsp=
; 5 cells: 3103023</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp; &nbsp=
; 6 cells: 3059903</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp; &nbsp=
; 7 cells: 1903018</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp; &nbsp=
; 8 cells: 1509297</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;&nbsp;=
 10 cells: 2420359</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;&nbsp;=
 12 cells: 1624895</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;&nbsp;=
 14 cells: 1171678</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;&nbsp;=
 17 cells: 1289391</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;&nbsp;=
 20 cells: 909777</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;&nbsp;=
 24 cells: 852081</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;&nbsp;=
 29 cells: 722925</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;&nbsp;=
 35 cells: 587067</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;&nbsp;=
 42 cells: 459473</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;&nbsp;=
 50 cells: 358744</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;&nbsp;=
 60 cells: 304146</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;&nbsp;=
 72 cells: 244682</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;&nbsp;=
 86 cells: 191045</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp; 103 c=
ells: 155337</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp; 124 c=
ells: 127061</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp; 149 c=
ells: 98913</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp; 179 c=
ells: 77454</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp; 215 c=
ells: 59849</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp; 258 c=
ells: 46117</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp; 310 c=
ells: 35321</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp; 372 c=
ells: 26319</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp; 446 c=
ells: 19379</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp; 535 c=
ells: 13783</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp; 642 c=
ells: 9993</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp; 770 c=
ells: 6973</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp; 924 c=
ells: 4713</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;1109 c=
ells: 3229</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;1331 c=
ells: 2062</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;1597 c=
ells: 1338</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;1916 c=
ells: 773</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;2299 c=
ells: 495</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;2759 c=
ells: 268</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;3311 c=
ells: 150</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;3973 c=
ells: 100</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;4768 c=
ells: 42</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;5722 c=
ells: 24</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;6866 c=
ells: 12</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;8239 c=
ells: 9</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">&nbsp;9887 c=
ells: 3</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">11864 cells:=
 0</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">14237 cells:=
 5</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">17084 cells:=
 1</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">20501 cells:=
 0</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">24601 cells:=
 2</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">29521 cells:=
 0</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">35425 cells:=
 0</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">42510 cells:=
 0</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">51012 cells:=
 0</p>
<p style=3D"margin: 0px; font-size: 11px; font-family: Menlo;">61214 cells:=
 2</p></div><div><br></div><span id=3D"OLK_SRC_BODY_SECTION"><div style=3D"=
font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-B=
OTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-=
LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT:=
 medium none; PADDING-TOP: 3pt"><span style=3D"font-weight:bold">From: </sp=
an> DuyHai Doan &lt;<a href=3D"mailto:doanduyhai@gmail.com">doanduyhai@gmai=
l.com</a>&gt;<br><span style=3D"font-weight:bold">Reply-To: </span> &quot;<=
a href=3D"mailto:user@cassandra.apache.org">user@cassandra.apache.org</a>&q=
uot; &lt;<a href=3D"mailto:user@cassandra.apache.org">user@cassandra.apache=
.org</a>&gt;<br><span style=3D"font-weight:bold">Date: </span> Thursday, Ju=
ly 24, 2014 at 3:01 PM<br><span style=3D"font-weight:bold">To: </span> &quo=
t;<a href=3D"mailto:user@cassandra.apache.org">user@cassandra.apache.org</a=
>&quot; &lt;<a href=3D"mailto:user@cassandra.apache.org">user@cassandra.apa=
che.org</a>&gt;<br><span style=3D"font-weight:bold">Subject: </span> Re: Ho=
t, large row<br></div><div><br></div><div dir=3D"ltr"><div><div>&quot;How c=
an I detect wide rows?&quot; --&gt;<br><br></div>nodetool cfhistograms &lt;=
keyspace&gt; &lt;suspected column family&gt;<br><br></div>Look at column &q=
uot;Column count&quot; (last column) and identify a line in this column wit=
h very high value of &quot;Offset&quot;. In a well designed application you=
 should have a gaussian distribution where 80% of your row have a similar n=
umber of columns.<br><div><br>&quot;Anyone know what debug level I can set =
so that I can see what reads the hot node is handling?&nbsp; &quot; --&gt;<=
br><br></div><div>&quot;nodetool settraceprobability &lt;value&gt;&quot;,&n=
bsp; where value is a small number (0.001) on the node where you encounter =
the issue. Activate the tracing for a while (5 mins) and deactivate it (val=
ue =3D 0). Then look into system_traces tables &quot;events&quot; &amp; &qu=
ot;sessions&quot;. It may help or not since the tracing is done once every =
1000. <br></div><div><br>&quot;Any way to get the server to blacklist these=
 wide rows automatically?&quot; --&gt; No<br></div></div><div class=3D"gmai=
l_extra"><br><br><div class=3D"gmail_quote">On Thu, Jul 24, 2014 at 8:48 PM=
, Keith Wright <span dir=3D"ltr">&lt;<a href=3D"mailto:kwright@nanigans.com=
" target=3D"_blank">kwright@nanigans.com</a>&gt;</span> wrote:<br><blockquo=
te class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc so=
lid;padding-left:1ex"><div style=3D"word-wrap:break-word;color:rgb(0,0,0);f=
ont-size:14px;font-family:Calibri,sans-serif"><div>Hi all,</div><div><br></=
div><div>&nbsp; &nbsp;We are seeing an issue where basically daily one of o=
ur nodes spikes in load and is churning in CMS heap pressure. &nbsp;It appe=
ars that reads are backing up and my guess is that our application is readi=
ng a large row repeatedly. &nbsp;Our write structure can lead itself to wid=
e rows very infrequently (&lt;0.001%) and we do our best to detect and dele=
te them but obviously we=92re missing a case. &nbsp;Hoping for assistance o=
n the following questions:</div><ul><li>How can I detect wide rows?</li><li=
>Anyone know what debug level I can set so that I can see what reads the ho=
t node is handling? &nbsp;I=92m hoping to see the =93bad=94 row</li><li>Any=
 way to get the server to blacklist these wide rows automatically?</li></ul=
><div>We=92re using C* 2.0.6 with Vnodes.</div><div><br></div><div>Thanks</=
div></div></blockquote></div><br></div></span></body></html>

--_000_CFF6D4A833759kwrightnaniganscom_--