From user-return-64572-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Fri Oct 18 13:34:34 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id BCD511804BB for ; Fri, 18 Oct 2019 15:34:33 +0200 (CEST) Received: (qmail 60905 invoked by uid 500); 18 Oct 2019 13:34:30 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 60895 invoked by uid 99); 18 Oct 2019 13:34:30 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Oct 2019 13:34:30 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 84D041A1874 for ; Fri, 18 Oct 2019 13:34:29 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.501 X-Spam-Level: X-Spam-Status: No, score=0.501 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=0.2, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id FLrSFbuQCWSm for ; Fri, 18 Oct 2019 13:34:27 +0000 (UTC) Received-SPF: Pass (helo) identity=helo; client-ip=77.72.1.66; helo=hermes.krystal.co.uk; envelope-from=paul@redshots.com; receiver= Received: from hermes.krystal.co.uk (hermes.krystal.co.uk [77.72.1.66]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 79C41BC9CC for ; Fri, 18 Oct 2019 13:34:27 +0000 (UTC) Received: from cpc101534-mort7-2-0-cust15.19-2.cable.virginm.net ([92.234.117.16]:49886 helo=[192.168.0.65]) by hermes.krystal.co.uk with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.92) (envelope-from ) id 1iLSOg-0001QV-DY for user@cassandra.apache.org; Fri, 18 Oct 2019 14:34:26 +0100 From: Paul Chandler Content-Type: multipart/alternative; boundary="Apple-Mail=_61D002DB-35F8-4BC4-A586-9EA4F0A59090" Mime-Version: 1.0 (Mac OS X Mail 12.0 \(3445.100.39\)) Subject: Re: TWCS and gc_grace_seconds Date: Fri, 18 Oct 2019 14:34:24 +0100 References: <7F836890-603B-4CCC-A490-1C6E41DEB974@gmail.com> To: user@cassandra.apache.org In-Reply-To: Message-Id: X-Mailer: Apple Mail (2.3445.100.39) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - hermes.krystal.co.uk X-AntiAbuse: Original Domain - cassandra.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - redshots.com X-Get-Message-Sender-Via: hermes.krystal.co.uk: authenticated_id: paul@redshots.com X-Authenticated-Sender: hermes.krystal.co.uk: paul@redshots.com --Apple-Mail=_61D002DB-35F8-4BC4-A586-9EA4F0A59090 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi Adarsh, You will have problems if you manually delete data when using TWCS. To fully understand why, I recommend reading this The Last Pickle post: = https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html And this post I wrote that dives deeper into the problems with deletes: = http://www.redshots.com/cassandra-twcs-must-have-ttls/ Thanks=20 Paul > On 18 Oct 2019, at 14:22, Adarsh Kumar wrote: >=20 > Thanks Jeff, >=20 >=20 > I just checked with business and we have differences in having TTL. So = it will be manula purging always. We do not want to use LCS due to high = IOs. > So: > As the use case is of time series data model, TWCS will be give some = benefit (without TTL) and with frequent deleted data > Are there any best practices/recommendations to handle high number of = tombstones=20 > Can we handle this use case with STCS also (with some configurations) >=20 > Thanks in advance >=20 > Adarsh Kumar >=20 > On Fri, Oct 18, 2019 at 11:46 AM Jeff Jirsa > wrote: > Is everything in the table TTL=E2=80=99d?=20 >=20 > Do you do explicit deletes before the data is expected to expire ?=20 >=20 > Generally speaking, gcgs exists to prevent data resurrection. But = ttl=E2=80=99d data can=E2=80=99t be resurrected once it expires, so gcgs = has no purpose unless you=E2=80=99re deleting it before the ttl expires. = If you=E2=80=99re doing that, twcs won=E2=80=99t be able to drop whole = sstables anyway, so maybe LCS will be less disk usage (but much higher = IO) >=20 >> On Oct 17, 2019, at 10:36 PM, Adarsh Kumar > wrote: >>=20 >> =EF=BB=BF >> Hi, >>=20 >> We have a use case of time series data with TTL where we want to use = TimeWindowCompactionStrategy because of its better management for TTL = and tombstones. In this case, data we have is frequently deleted so we = want to reduce gc_grace_seconds to reduce the tombstones' life and = reduce pressure on storage. I have following questions: >> Do we always need to run repair for the table in reduced = gc_grace_seconds or there is any other way to manage repairs in this = vase >> Do we have any other strategy (or combination of strategies) to = manage frequently deleted time-series data >> Thanks in advance. >>=20 >> Adarsh Kumar --Apple-Mail=_61D002DB-35F8-4BC4-A586-9EA4F0A59090 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Hi = Adarsh,

You will = have problems if you manually delete data when using TWCS.

To fully understand why, = I recommend reading this The Last Pickle post: https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html
And this post I wrote that dives deeper into the = problems with deletes: http://www.redshots.com/cassandra-twcs-must-have-ttls/

Thanks 

Paul

On 18 = Oct 2019, at 14:22, Adarsh Kumar <adarsh0007@gmail.com> wrote:

Thanks Jeff,


I just checked with = business and we have differences in having TTL. So it will be manula = purging always. We do not want to use LCS due to high IOs.
So:
  1. As the = use case is of time series data model, TWCS will be give some benefit = (without TTL) and with frequent deleted data
  2. Are = there any best practices/recommendations to handle high number of = tombstones 
  3. Can we handle this use case  = with STCS also (with some configurations)

Thanks in advance

Adarsh = Kumar

On Fri, Oct 18, 2019 at 11:46 AM Jeff = Jirsa <jjirsa@gmail.com> wrote:
Is = everything in the table TTL=E2=80=99d? 

Do you do explicit deletes before the = data is expected to expire ? 

Generally speaking, gcgs exists to = prevent data resurrection. But ttl=E2=80=99d data can=E2=80=99t be = resurrected once it expires, so gcgs has no purpose unless you=E2=80=99re = deleting it before the ttl expires. If you=E2=80=99re doing that, twcs = won=E2=80=99t be able to drop whole sstables anyway, so maybe LCS will = be less disk usage (but much higher IO)

On Oct 17, 2019, at 10:36 PM, Adarsh Kumar <adarsh0007@gmail.com> wrote:

=EF=BB=BF
Hi,

We have a use case of = time series data with TTL where we want to use = TimeWindowCompactionStrategy because of its better management for TTL = and tombstones. In this case, data we have is frequently deleted so we = want to reduce gc_grace_seconds to reduce the tombstones' life and = reduce pressure on storage. I have following questions:
  1. Do we always need to run repair = for the table in reduced gc_grace_seconds or there is any other way to = manage repairs in this vase
  2. Do we have any other = strategy (or combination of strategies) to manage frequently deleted = time-series data
Thanks = in advance.

Adarsh Kumar

= --Apple-Mail=_61D002DB-35F8-4BC4-A586-9EA4F0A59090--