Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of kwright@nanigans.com
 designates 216.82.251.2 as permitted sender)
From: Keith Wright <kwright@nanigans.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
CC: Don Jackson <djackson@nanigans.com>
Date: Mon, 28 Jul 2014 08:48:58 -0500
Subject: Re: Hot, large row
Thread-Topic: Hot, large row
Thread-Index: Ac+qapjVur4TmonhSlyvkukzOMASJg==
Message-ID: <CFFBC30D.33A3E%kwright@nanigans.com>
In-Reply-To: <1B2AB0666AE2484890429BB605017569@JackKrupansky14>
Accept-Language: en-US
Content-Language: en-US
user-agent: Microsoft-MacOutlook/14.2.3.120616
acceptlanguage: en-US
Content-Type: multipart/alternative;
	boundary="_000_CFFBC30D33A3Ekwrightnaniganscom_"
MIME-Version: 1.0

--_000_CFFBC30D33A3Ekwrightnaniganscom_
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

I don=92t know but my guess is it would be without tombstones.  I did more =
research this weekend (note that my Sunday was largely interrupted by again=
 seeing a node go to high load/high CMS for ~3 hours) and came across this =
presentation:  http://www.slideshare.net/mobile/planetcassandra/8-axel-lilj=
encrantz-23204252

I definitely suggestion you give this a look, very informative.  The import=
ant take away is that they ran into the same issue as I due to using the sa=
me model where I am updating to the same row over time with a TTL causing t=
hat row to fragment across SSTables and once across 4+ tables, compaction c=
an never actually remove tombstones.  As I see it, I have the following opt=
ions and was hoping to get some advice:

1.  Modify my write structure to include time within the key.  Currently we=
 want to get all of a row but I can likely add month to the time and it wou=
ld be ok for the application to do two reads to get the most recent data (t=
o deal with month boundaries).  This will contain the fragmentation to one =
month.

2.  Following off of item #1, it appears that according to CASSANDRA-5514 t=
hat if I include time within my query it will not bother going through olde=
r SSTables and thus reduce the impact of the row fragmentation.  Problem he=
re is that likely my data space will still continue to grow over time as to=
mbstones will never be removed.

3.  Move from LCS to STCS and run full compactions periodically to cleanup =
tombstones

I appreciate the help!

From: Jack Krupansky <jack@basetechnology.com<mailto:jack@basetechnology.co=
m>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <us=
er@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Friday, July 25, 2014 at 11:15 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cas=
sandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Hot, large row

Is it the accumulated tombstones on a row that make it act as if =93wide=94=
? Does cfhistograms count the tombstones or subtract them when reporting on=
 cell-count for rows? (I don=92t know.)

-- Jack Krupansky

From: Keith Wright<mailto:kwright@nanigans.com>
Sent: Friday, July 25, 2014 10:24 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Cc: Don Jackson<mailto:djackson@nanigans.com>
Subject: Re: Hot, large row

Ha, check out who filed that ticket!   Yes I=92m aware of it.  My hope is t=
hat it was mostly addressed in CASSANDRA-6563 so I may upgrade from 2.0.6 t=
o 2.0.9.  I=92m really just surprised that others are not doing similar act=
ions as I and thus experiencing similar issues.

To answer DuyHai=92s questions:

How many nodes do you have ? And how many distinct user_id roughtly is ther=
e ?
- 14 nodes with approximately 250 million distinct user_ids

For GC activity, in general we see low GC pressure in both Par New and CMS =
(we see the occasional CMS spike but its usually under 100 ms).  When we se=
e a node locked up in CMS GC, its not that anyone GC takes a long time, its=
 just that the consistent nature of them causes the read latency to spike f=
rom the usual 3-5 ms up to 35 ms which causes issues for our application.

Also Jack Krupansky question is interesting. Even though you limit a reques=
t to 5000, if each cell is a big blob or block of text, it mays add up a lo=
t into JVM heap =85
- The columns values are actually timestamps and thus not variable in lengt=
h and we cap the length of other columns used in the primary key so I find =
if VERY unlikely that this is a cause.

I will look into the paging option with that native client but from the doc=
s it appears that its enabled by default, right?

I greatly appreciate all the help!

From: Ken Hancock <ken.hancock@schange.com<mailto:ken.hancock@schange.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <us=
er@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Friday, July 25, 2014 at 10:06 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cas=
sandra.apache.org<mailto:user@cassandra.apache.org>>
Cc: Don Jackson <djackson@nanigans.com<mailto:djackson@nanigans.com>>
Subject: Re: Hot, large row

https://issues.apache.org/jira/browse/CASSANDRA-6654

--_000_CFFBC30D33A3Ekwrightnaniganscom_
Content-Type: text/html; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

<html><head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DWindows-1=
252"></head><body style=3D"word-wrap: break-word; -webkit-nbsp-mode: space;=
 -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14p=
x; font-family: Calibri, sans-serif;"><div>I don=92t know but my guess is i=
t would be without tombstones. &nbsp;I did more research this weekend (note=
 that my Sunday was largely interrupted by again seeing a node go to high l=
oad/high CMS for ~3 hours) and came across this presentation: &nbsp;<a href=
=3D"http://www.slideshare.net/mobile/planetcassandra/8-axel-liljencrantz-23=
204252" style=3D"font-size: medium; font-family: Consolas;">http://www.slid=
eshare.net/mobile/planetcassandra/8-axel-liljencrantz-23204252</a></div><di=
v><br></div><div>I definitely suggestion you give this a look, very informa=
tive. &nbsp;The important take away is that they ran into the same issue as=
 I due to using the same model where I am updating to the same row over tim=
e with a TTL causing that row to fragment across SSTables and once across 4=
&#43; tables, compaction can never actually remove tombstones. &nbsp;As I s=
ee it, I have the following options and was hoping to get some advice:</div=
><div><br></div><div>1. &nbsp;Modify my write structure to include time wit=
hin the key. &nbsp;Currently we want to get all of a row but I can likely a=
dd month to the time and it would be ok for the application to do two reads=
 to get the most recent data (to deal with month boundaries). &nbsp;This wi=
ll contain the fragmentation to one month.</div><div><br></div><div>2. &nbs=
p;Following off of item #1, it appears that according to CASSANDRA-5514 tha=
t if I include time within my query it will not bother going through older =
SSTables and thus reduce the impact of the row fragmentation. &nbsp;Problem=
 here is that likely my data space will still continue to grow over time as=
 tombstones will never be removed.</div><div><br></div><div>3. &nbsp;Move f=
rom LCS to STCS and run full compactions periodically to cleanup tombstones=
</div><div><br></div><div>I appreciate the help!</div><div><br></div><span =
id=3D"OLK_SRC_BODY_SECTION"><div style=3D"font-family:Calibri; font-size:11=
pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: =
medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BO=
RDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt"><=
span style=3D"font-weight:bold">From: </span> Jack Krupansky &lt;<a href=3D=
"mailto:jack@basetechnology.com">jack@basetechnology.com</a>&gt;<br><span s=
tyle=3D"font-weight:bold">Reply-To: </span> &quot;<a href=3D"mailto:user@ca=
ssandra.apache.org">user@cassandra.apache.org</a>&quot; &lt;<a href=3D"mail=
to:user@cassandra.apache.org">user@cassandra.apache.org</a>&gt;<br><span st=
yle=3D"font-weight:bold">Date: </span> Friday, July 25, 2014 at 11:15 AM<br=
><span style=3D"font-weight:bold">To: </span> &quot;<a href=3D"mailto:user@=
cassandra.apache.org">user@cassandra.apache.org</a>&quot; &lt;<a href=3D"ma=
ilto:user@cassandra.apache.org">user@cassandra.apache.org</a>&gt;<br><span =
style=3D"font-weight:bold">Subject: </span> Re: Hot, large row<br></div><di=
v><br></div><div><div style=3D"WORD-WRAP: break-word; -webkit-nbsp-mode: sp=
ace; -webkit-line-break: after-white-space" dir=3D"ltr"><div dir=3D"ltr"><d=
iv style=3D"FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR: #000000"><div>I=
s it the accumulated tombstones on a row that make it act as if =93wide=94?=
 Does cfhistograms count the tombstones or subtract them when reporting on =
cell-count for rows? (I don=92t know.)</div><div>&nbsp;</div><div style=3D"=
FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR: #000000">-- Jack Krupansky<=
/div><div style=3D"FONT-SIZE: small; TEXT-DECORATION: none; FONT-FAMILY: &q=
uot;Calibri&quot;; FONT-WEIGHT: normal; COLOR: #000000; FONT-STYLE: normal;=
 DISPLAY: inline"><div style=3D"FONT: 10pt tahoma"><div>&nbsp;</div><div st=
yle=3D"BACKGROUND: #f5f5f5"><div style=3D"font-color: black"><b>From:</b> <=
a title=3D"kwright@nanigans.com" href=3D"mailto:kwright@nanigans.com">
Keith Wright</a> </div><div><b>Sent:</b> Friday, July 25, 2014 10:24 AM</di=
v><div><b>To:</b> <a title=3D"user@cassandra.apache.org" href=3D"mailto:use=
r@cassandra.apache.org">
user@cassandra.apache.org</a> </div><div><b>Cc:</b> <a title=3D"djackson@na=
nigans.com" href=3D"mailto:djackson@nanigans.com">
Don Jackson</a> </div><div><b>Subject:</b> Re: Hot, large row</div></div></=
div><div>&nbsp;</div></div><div style=3D"FONT-SIZE: small; TEXT-DECORATION:=
 none; FONT-FAMILY: &quot;Calibri&quot;; FONT-WEIGHT: normal; COLOR: #00000=
0; FONT-STYLE: normal; DISPLAY: inline"><div style=3D"FONT-SIZE: 14px; FONT=
-FAMILY: calibri, sans-serif; COLOR: rgb(0,0,0)">
Ha, check out who filed that ticket!&nbsp;&nbsp; Yes I=92m aware of it.&nbs=
p; My hope is that it was mostly addressed in CASSANDRA-6563 so I may upgra=
de from 2.0.6 to 2.0.9.&nbsp; I=92m really just surprised that others are n=
ot doing similar actions as I and thus experiencing similar
 issues.</div><div style=3D"FONT-SIZE: 14px; FONT-FAMILY: calibri, sans-ser=
if; COLOR: rgb(0,0,0)">
&nbsp;</div><div style=3D"FONT-SIZE: 14px; FONT-FAMILY: calibri, sans-serif=
; COLOR: rgb(0,0,0)">
To answer DuyHai=92s questions:</div><div style=3D"FONT-SIZE: 14px; FONT-FA=
MILY: calibri, sans-serif; COLOR: rgb(0,0,0)">
&nbsp;</div><div style=3D"FONT-SIZE: 14px; FONT-FAMILY: calibri, sans-serif=
; COLOR: rgb(0,0,0)"><span style=3D"FONT-SIZE: medium; FONT-FAMILY: calibri=
">How many nodes do you have ? And how many distinct user_id roughtly is th=
ere ?</span></div><div style=3D"FONT-SIZE: 14px; FONT-FAMILY: calibri, sans=
-serif; COLOR: rgb(0,0,0)">
- 14 nodes with approximately 250 million distinct user_ids</div><div style=
=3D"FONT-SIZE: 14px; FONT-FAMILY: calibri, sans-serif; COLOR: rgb(0,0,0)">
&nbsp;</div><div style=3D"FONT-SIZE: 14px; FONT-FAMILY: calibri, sans-serif=
; COLOR: rgb(0,0,0)">
For GC activity, in general we see low GC pressure in both Par New and CMS =
(we see the occasional CMS spike but its usually under 100 ms).&nbsp; When =
we see a node locked up in CMS GC, its not that anyone GC takes a long time=
, its just that the consistent nature
 of them causes the read latency to spike from the usual 3-5 ms up to 35 ms=
 which causes issues for our application.</div><div style=3D"FONT-SIZE: 14p=
x; FONT-FAMILY: calibri, sans-serif; COLOR: rgb(0,0,0)">
&nbsp;</div><div>Also Jack Krupansky question is interesting. Even though y=
ou limit a request to 5000, if each cell is a big blob or block of text, it=
 mays add up a lot into JVM heap =85
</div><div>- The columns values are actually timestamps and thus not variab=
le in length and we cap the length of other columns used in the primary key=
 so I find if VERY unlikely that this is a cause.</div><div>&nbsp;</div><di=
v>I will look into the paging option with that native client but from the d=
ocs it appears that its enabled by default, right?&nbsp;
</div><div>&nbsp;</div><div>I greatly appreciate all the help!</div><div st=
yle=3D"FONT-SIZE: 14px; FONT-FAMILY: calibri, sans-serif; COLOR: rgb(0,0,0)=
">
&nbsp;</div><span id=3D"OLK_SRC_BODY_SECTION" style=3D"font-size: 14px; fon=
t-family: calibri, sans-serif; color: rgb(0, 0, 0);"><div style=3D"FONT-SIZ=
E: 11pt; BORDER-TOP: #b5c4df 1pt solid; FONT-FAMILY: calibri; BORDER-RIGHT:=
 medium none; BORDER-BOTTOM: medium none; COLOR: black; PADDING-BOTTOM: 0in=
; TEXT-ALIGN: left; PADDING-TOP: 3pt; PADDING-LEFT: 0in; BORDER-LEFT: mediu=
m none; PADDING-RIGHT: 0in"><span style=3D"FONT-WEIGHT: bold">From: </span>=
Ken Hancock &lt;<a href=3D"mailto:ken.hancock@schange.com">ken.hancock@scha=
nge.com</a>&gt;<br><span style=3D"FONT-WEIGHT: bold">Reply-To: </span>&quot=
;<a href=3D"mailto:user@cassandra.apache.org">user@cassandra.apache.org</a>=
&quot; &lt;<a href=3D"mailto:user@cassandra.apache.org">user@cassandra.apac=
he.org</a>&gt;<br><span style=3D"FONT-WEIGHT: bold">Date: </span>Friday, Ju=
ly 25, 2014 at 10:06 AM<br><span style=3D"FONT-WEIGHT: bold">To: </span>&qu=
ot;<a href=3D"mailto:user@cassandra.apache.org">user@cassandra.apache.org</=
a>&quot; &lt;<a href=3D"mailto:user@cassandra.apache.org">user@cassandra.ap=
ache.org</a>&gt;<br><span style=3D"FONT-WEIGHT: bold">Cc: </span>Don Jackso=
n &lt;<a href=3D"mailto:djackson@nanigans.com">djackson@nanigans.com</a>&gt=
;<br><span style=3D"FONT-WEIGHT: bold">Subject: </span>Re: Hot, large row<b=
r></div><div>&nbsp;</div><a style=3D"WHITE-SPACE: normal; WORD-SPACING: 0px=
; TEXT-TRANSFORM: none; FONT: medium calibri; LETTER-SPACING: normal; TEXT-=
INDENT: 0px; -webkit-text-stroke-width: 0px" href=3D"https://issues.apache.=
org/jira/browse/CASSANDRA-6654" target=3D"_blank">https://issues.apache.org=
/jira/browse/CASSANDRA-6654</a></span></div></div></div></div></div></span>=
</body></html>

--_000_CFFBC30D33A3Ekwrightnaniganscom_--