Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BC37511721 for ; Mon, 28 Jul 2014 13:50:33 +0000 (UTC) Received: (qmail 76097 invoked by uid 500); 28 Jul 2014 13:50:28 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 76059 invoked by uid 500); 28 Jul 2014 13:50:28 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 76049 invoked by uid 99); 28 Jul 2014 13:50:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Jul 2014 13:50:28 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS,UNPARSEABLE_RELAY X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of kwright@nanigans.com designates 216.82.251.2 as permitted sender) Received: from [216.82.251.2] (HELO mail1.bemta12.messagelabs.com) (216.82.251.2) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Jul 2014 13:50:24 +0000 Received: from [216.82.250.115:62575] by server-2.bemta-12.messagelabs.com id F9/A7-07069-B0556D35; Mon, 28 Jul 2014 13:50:03 +0000 X-Env-Sender: kwright@nanigans.com X-Msg-Ref: server-12.tower-127.messagelabs.com!1406555399!14845763!5 X-Originating-IP: [216.166.12.97] X-StarScan-Received: X-StarScan-Version: 6.11.3; banners=-,-,- X-VirusChecked: Checked Received: (qmail 29405 invoked from network); 28 Jul 2014 13:50:01 -0000 Received: from out001.collaborationhost.net (HELO out001.collaborationhost.net) (216.166.12.97) by server-12.tower-127.messagelabs.com with RC4-SHA encrypted SMTP; 28 Jul 2014 13:50:01 -0000 Received: from AUSP01VMBX28.collaborationhost.net ([10.2.228.36]) by AUSP01MHUB06.collaborationhost.net ([10.2.8.241]) with mapi; Mon, 28 Jul 2014 08:48:22 -0500 From: Keith Wright To: "user@cassandra.apache.org" CC: Don Jackson Date: Mon, 28 Jul 2014 08:48:58 -0500 Subject: Re: Hot, large row Thread-Topic: Hot, large row Thread-Index: Ac+qapjVur4TmonhSlyvkukzOMASJg== Message-ID: In-Reply-To: <1B2AB0666AE2484890429BB605017569@JackKrupansky14> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.2.3.120616 acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_CFFBC30D33A3Ekwrightnaniganscom_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_CFFBC30D33A3Ekwrightnaniganscom_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable I don=92t know but my guess is it would be without tombstones. I did more = research this weekend (note that my Sunday was largely interrupted by again= seeing a node go to high load/high CMS for ~3 hours) and came across this = presentation: http://www.slideshare.net/mobile/planetcassandra/8-axel-lilj= encrantz-23204252 I definitely suggestion you give this a look, very informative. The import= ant take away is that they ran into the same issue as I due to using the sa= me model where I am updating to the same row over time with a TTL causing t= hat row to fragment across SSTables and once across 4+ tables, compaction c= an never actually remove tombstones. As I see it, I have the following opt= ions and was hoping to get some advice: 1. Modify my write structure to include time within the key. Currently we= want to get all of a row but I can likely add month to the time and it wou= ld be ok for the application to do two reads to get the most recent data (t= o deal with month boundaries). This will contain the fragmentation to one = month. 2. Following off of item #1, it appears that according to CASSANDRA-5514 t= hat if I include time within my query it will not bother going through olde= r SSTables and thus reduce the impact of the row fragmentation. Problem he= re is that likely my data space will still continue to grow over time as to= mbstones will never be removed. 3. Move from LCS to STCS and run full compactions periodically to cleanup = tombstones I appreciate the help! From: Jack Krupansky > Reply-To: "user@cassandra.apache.org" > Date: Friday, July 25, 2014 at 11:15 AM To: "user@cassandra.apache.org" > Subject: Re: Hot, large row Is it the accumulated tombstones on a row that make it act as if =93wide=94= ? Does cfhistograms count the tombstones or subtract them when reporting on= cell-count for rows? (I don=92t know.) -- Jack Krupansky From: Keith Wright Sent: Friday, July 25, 2014 10:24 AM To: user@cassandra.apache.org Cc: Don Jackson Subject: Re: Hot, large row Ha, check out who filed that ticket! Yes I=92m aware of it. My hope is t= hat it was mostly addressed in CASSANDRA-6563 so I may upgrade from 2.0.6 t= o 2.0.9. I=92m really just surprised that others are not doing similar act= ions as I and thus experiencing similar issues. To answer DuyHai=92s questions: How many nodes do you have ? And how many distinct user_id roughtly is ther= e ? - 14 nodes with approximately 250 million distinct user_ids For GC activity, in general we see low GC pressure in both Par New and CMS = (we see the occasional CMS spike but its usually under 100 ms). When we se= e a node locked up in CMS GC, its not that anyone GC takes a long time, its= just that the consistent nature of them causes the read latency to spike f= rom the usual 3-5 ms up to 35 ms which causes issues for our application. Also Jack Krupansky question is interesting. Even though you limit a reques= t to 5000, if each cell is a big blob or block of text, it mays add up a lo= t into JVM heap =85 - The columns values are actually timestamps and thus not variable in lengt= h and we cap the length of other columns used in the primary key so I find = if VERY unlikely that this is a cause. I will look into the paging option with that native client but from the doc= s it appears that its enabled by default, right? I greatly appreciate all the help! From: Ken Hancock > Reply-To: "user@cassandra.apache.org" > Date: Friday, July 25, 2014 at 10:06 AM To: "user@cassandra.apache.org" > Cc: Don Jackson > Subject: Re: Hot, large row https://issues.apache.org/jira/browse/CASSANDRA-6654 --_000_CFFBC30D33A3Ekwrightnaniganscom_ Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable
I don=92t know but my guess is i= t would be without tombstones.  I did more research this weekend (note= that my Sunday was largely interrupted by again seeing a node go to high l= oad/high CMS for ~3 hours) and came across this presentation:  http://www.slid= eshare.net/mobile/planetcassandra/8-axel-liljencrantz-23204252

I definitely suggestion you give this a look, very informa= tive.  The important take away is that they ran into the same issue as= I due to using the same model where I am updating to the same row over tim= e with a TTL causing that row to fragment across SSTables and once across 4= + tables, compaction can never actually remove tombstones.  As I s= ee it, I have the following options and was hoping to get some advice:

1.  Modify my write structure to include time wit= hin the key.  Currently we want to get all of a row but I can likely a= dd month to the time and it would be ok for the application to do two reads= to get the most recent data (to deal with month boundaries).  This wi= ll contain the fragmentation to one month.

2. &nbs= p;Following off of item #1, it appears that according to CASSANDRA-5514 tha= t if I include time within my query it will not bother going through older = SSTables and thus reduce the impact of the row fragmentation.  Problem= here is that likely my data space will still continue to grow over time as= tombstones will never be removed.

3.  Move f= rom LCS to STCS and run full compactions periodically to cleanup tombstones=

I appreciate the help!

<= span style=3D"font-weight:bold">From: Jack Krupansky <jack@basetechnology.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Friday, July 25, 2014 at 11:15 AMTo: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Hot, large row

I= s it the accumulated tombstones on a row that make it act as if =93wide=94?= Does cfhistograms count the tombstones or subtract them when reporting on = cell-count for rows? (I don=92t know.)
 
-- Jack Krupansky<= /div>
 
From: <= a title=3D"kwright@nanigans.com" href=3D"mailto:kwright@nanigans.com"> Keith Wright
Sent: Friday, July 25, 2014 10:24 AM
Subject: Re: Hot, large row
 
Ha, check out who filed that ticket!   Yes I=92m aware of it.&nbs= p; My hope is that it was mostly addressed in CASSANDRA-6563 so I may upgra= de from 2.0.6 to 2.0.9.  I=92m really just surprised that others are n= ot doing similar actions as I and thus experiencing similar issues.
 
To answer DuyHai=92s questions:
 
- 14 nodes with approximately 250 million distinct user_ids
 
For GC activity, in general we see low GC pressure in both Par New and CMS = (we see the occasional CMS spike but its usually under 100 ms).  When = we see a node locked up in CMS GC, its not that anyone GC takes a long time= , its just that the consistent nature of them causes the read latency to spike from the usual 3-5 ms up to 35 ms= which causes issues for our application.
 
Also Jack Krupansky question is interesting. Even though y= ou limit a request to 5000, if each cell is a big blob or block of text, it= mays add up a lot into JVM heap =85
- The columns values are actually timestamps and thus not variab= le in length and we cap the length of other columns used in the primary key= so I find if VERY unlikely that this is a cause.
 
I will look into the paging option with that native client but from the d= ocs it appears that its enabled by default, right? 
 
I greatly appreciate all the help!
From: = Ken Hancock <ken.hancock@scha= nge.com>
Reply-To: "= ;user@cassandra.apache.org= " <user@cassandra.apac= he.org>
Date: Friday, Ju= ly 25, 2014 at 10:06 AM
To: &qu= ot;user@cassandra.apache.org" <user@cassandra.ap= ache.org>
Cc: Don Jackso= n <djackson@nanigans.com>= ;
Subject: Re: Hot, large row
 
https://issues.apache.org= /jira/browse/CASSANDRA-6654
= --_000_CFFBC30D33A3Ekwrightnaniganscom_--