Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 49B7B18A22 for ; Fri, 21 Aug 2015 17:31:36 +0000 (UTC) Received: (qmail 73199 invoked by uid 500); 21 Aug 2015 17:31:32 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 73161 invoked by uid 500); 21 Aug 2015 17:31:32 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 73151 invoked by uid 99); 21 Aug 2015 17:31:32 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Aug 2015 17:31:32 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id BBAA61826D6 for ; Fri, 21 Aug 2015 17:31:31 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.121 X-Spam-Level: *** X-Spam-Status: No, score=3.121 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, KAM_HUGEIMGSRC=0.2, T_KAM_HTML_FONT_INVALID=0.01, T_REMOTE_IMAGE=0.01, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=datastax.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id aVmJrURzMOIU for ; Fri, 21 Aug 2015 17:31:17 +0000 (UTC) Received: from mail-io0-f171.google.com (mail-io0-f171.google.com [209.85.223.171]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 37C2A20380 for ; Fri, 21 Aug 2015 17:31:16 +0000 (UTC) Received: by iodv127 with SMTP id v127so89062799iod.3 for ; Fri, 21 Aug 2015 10:31:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=datastax.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=h/G1/jFL97pmw2JfbCP7xPfvscjcSHGxGcdlIRhKm2o=; b=CSQia89hJpI/x7IHDOEXjzKPK/rF+3vB0/0UZugldwMKxNM9J2xuOymjUIe0m6yJBN /vKHa4A3FerJd4ZM3CUx2WNulyW+Fj77Ic5B6nRb7KMZl3CtDzCsrxIuFJ1EZTpqVkOy nzRMge8U5PLKZim1RmSr+jX84puB77jezNBqE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=h/G1/jFL97pmw2JfbCP7xPfvscjcSHGxGcdlIRhKm2o=; b=nJtetLKf4enlwH02WIBhlcrfOX1+tnr6eyKYvqiZ8JuNT5GCtuwMT93+hMXhHt45zY 1OpOoJwtxXQwSLJ5rGrj6Jks+Qj4EgJlhfyCwdAdH2wbFuCbYu8V22ejQLZGiujfL5qy lsh3RBKG2hAKk5uzrUJEwzBAh+zES3PQEW884k2nkW0d3/5T+IlZCbuXEQ1278e9K3I5 +T5fPec1k+pAcl0NSpwGqDHB0gzIl9+5lGeFYfG3jl6mFkcSzdupOMZFd1TtDYwVC8Ct 1qoBhxMF/RyyLdoI0ZlLiNv+F2Y3tnaU0TBhYtQbY8qr5+8IXnO1K8kvWYEgITjTWcX/ yPzQ== X-Gm-Message-State: ALoCoQkIbyTCqxWQxPgHomhcYtI5pfNcqP/BunrGTt42SUxPz7y8rVf6xEMTJdI6q49y3fg3ZZGh X-Received: by 10.107.11.151 with SMTP id 23mr7552040iol.69.1440178275062; Fri, 21 Aug 2015 10:31:15 -0700 (PDT) MIME-Version: 1.0 Received: by 10.79.98.131 with HTTP; Fri, 21 Aug 2015 10:30:55 -0700 (PDT) In-Reply-To: References: From: Sebastian Estevez Date: Fri, 21 Aug 2015 13:30:55 -0400 Message-ID: Subject: Re: Question about how to remove data To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a113edd1c7faae1051dd5a204 --001a113edd1c7faae1051dd5a204 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable To clarify, you do not need a ttl for deletes to be compacted away in Cassandra. When you delete, we create a tombstone which will remain in the system __at least__ gc grace seconds. We wait this long to give the tombstone a chance to make it to all replica nodes, the best practice is to run repairs as often as gc grace seconds in order to ensure edge cases where data comes back to life (i.e. the tombstone was never sent to one of your replicas and when the tombstones and data are removed from the other two replicas, all that is left is the old value. __at least__ are the key words in the previous paragraph, there are more conditions that need to be met in order for a tombstone to actually get cleaned up. As most things in Cassandra, these conditions are configurable (via the following compaction sub-properties): http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_configur= e_compaction_t.html All the best, [image: datastax_logo.png] Sebasti=C3=A1n Est=C3=A9vez Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com [image: linkedin.png] [image: facebook.png] [image: twitter.png] [image: g+.png] DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world=E2=80=99s most innovative enterpri= ses. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Thu, Aug 20, 2015 at 4:13 PM, Daniel Chia wrote: > The TTL shouldn't matter if you deleted the data, since to my > understanding the delete should shadow the data signaling to C* that the > data is a candidate for removal on compaction. > > Others might know better, but it could very well be the fact that > gc_grace_seconds is 0 that is causing your problems. Others might have > other suggestions, but you could potentially use sstable2json to see the > raw contents of the sstable on disk and see why data is still there. > > Thanks, > Daniel > > On Thu, Aug 20, 2015 at 12:55 PM, Analia Lorenzatto < > analialorenzatto@gmail.com> wrote: > >> Hello, >> >> Daniel, I am using Size Tiered compaction. >> >> My concern is that as I do not have a TTL defined on the Column family, >> and I do not have the possibility to create it. Perhaps, the "deleted >> data" is never actually going to be removed? >> >> Thanks a lot! >> >> >> On Thu, Aug 20, 2015 at 4:24 AM, Daniel Chia >> wrote: >> >>> Is this a LCS family, or Size Tiered? Manually running compaction on LC= S >>> doesn't do anything until C* 2.2 ( >>> https://issues.apache.org/jira/browse/CASSANDRA-7272) >>> >>> Thanks, >>> Daniel >>> >>> On Wed, Aug 19, 2015 at 6:56 PM, Analia Lorenzatto < >>> analialorenzatto@gmail.com> wrote: >>> >>>> Hello Michael, >>>> >>>> Thanks for responding! >>>> >>>> I do not have snapshots on any node of the cluster. >>>> >>>> Saludos / Regards. >>>> >>>> Anal=C3=ADa Lorenzatto. >>>> >>>> "Hapiness is not something really made. It comes from your own actions= " >>>> by Dalai Lama >>>> >>>> >>>> On 19 Aug 2015 6:19 pm, "Laing, Michael" >>>> wrote: >>>> >>>>> Possibly you have snapshots? If so, use nodetool to clear them. >>>>> >>>>> On Wed, Aug 19, 2015 at 4:54 PM, Analia Lorenzatto < >>>>> analialorenzatto@gmail.com> wrote: >>>>> >>>>>> Hello guys, >>>>>> >>>>>> I have a cassandra cluster 2.1 comprised of 4 nodes. >>>>>> >>>>>> I removed a lot of data in a Column Family, then I ran manually a >>>>>> compaction on this Column family on every node. After doing that, = If I >>>>>> query that data, cassandra correctly says this data is not there. B= ut the >>>>>> space on disk is exactly the same before removing that data. >>>>>> >>>>>> Also, I realized that gc_grace_seconds =3D 0. Some people on the >>>>>> internet say that it could produce zombie data, what do you think? >>>>>> >>>>>> I do not have a TTL defined on the Column family, and I do not have >>>>>> the possibility to create it. So my questions is, given that I do = not >>>>>> have a TTL defined is data going to be removed? or the deleted data= is >>>>>> never actually going to be deleted due to I do not have a TTL? >>>>>> >>>>>> >>>>>> Thanks in advance! >>>>>> >>>>>> -- >>>>>> Saludos / Regards. >>>>>> >>>>>> Anal=C3=ADa Lorenzatto. >>>>>> >>>>>> =E2=80=9CIt's possible to commit no errors and still lose. That is n= ot >>>>>> weakness. That is life". By Captain Jean-Luc Picard. >>>>>> >>>>> >>>>> >>> >> >> >> -- >> Saludos / Regards. >> >> Anal=C3=ADa Lorenzatto. >> >> =E2=80=9CIt's possible to commit no errors and still lose. That is not w= eakness. >> That is life". By Captain Jean-Luc Picard. >> > > --001a113edd1c7faae1051dd5a204 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
To clarify, you do not need a ttl for deletes to be compac= ted away in Cassandra. When you delete, we create a tombstone which will re= main in the system __at least__ gc grace seconds. We wait this long to give= the tombstone a chance to make it to all replica nodes, the best practice = is to run repairs as often as gc grace seconds in order to ensure edge case= s where data comes back to life (i.e. the tombstone was never sent to one o= f your replicas and when the tombstones and data are removed from the other= two replicas, all that is left is the old value.

__at l= east__ are the key words in the previous paragraph, there are more conditio= ns that need to be met in order for a tombstone to actually get cleaned up.= As most things in Cassandra, these conditions are configurable (via the fo= llowing compaction sub-properties):

All the best,


Sebasti=C3=A1n Est=C3=A9vez

Solutio= ns Architect | 954 905 8615 | sebastian.estevez@datastax.com

3D"linkedin.png" <= /span>3D"facebook.png" 3D"twitter.png" 3D"g+.png"



Data= Stax is the fastest, most scalable distributed database technology, delivering A= pache Cassandra to the world=E2=80=99s most innovative enterprises. Datasta= x is built to be agile, always-on, and predictably scalable to any size. Wi= th more than 500 customers in 45 countries, DataS= tax is the database technology and transactional backbone of choice for the= worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.=
=

On Thu, Aug 20, 2015 at 4:13 PM, Daniel Chia= <danchia@coursera.org> wrote:
The TTL shouldn't matter if you deleted the da= ta, since to my understanding the delete should shadow the data signaling t= o C* that the data is a candidate for removal on compaction.

=
Others might know better, but it could very well be the fact that gc_g= race_seconds is 0 that is causing your problems. Others might have other su= ggestions, but you could potentially use sstable2json to see the raw conten= ts of the sstable on disk and see why data is still there.

Thanks,
Daniel
<= /div>

On Thu, Aug 20, 2015 at 12:55 PM, Analia Lor= enzatto <analialorenzatto@gmail.com> wrote:
Hello,

Danie= l, I am using=C2=A0Size Tiered compaction.

My conc= ern is that as I do not have a TTL defined on the Column family, and I do n= ot have the possibility to create it. =C2=A0 Perhaps, the "deleted dat= a" is never actually going to be removed?

Tha= nks a lot!

<= br>
On Thu, Aug 20, 2015 at 4:24 AM, Daniel Chia = <danchia@coursera.org> wrote:
Is this a LCS family, or Size Tiered? Manually runn= ing compaction on LCS doesn't do anything until C* 2.2 (https:/= /issues.apache.org/jira/browse/CASSANDRA-7272)

Thanks,
Daniel

On Wed, Aug 19, 2015 at 6:56 PM, Analia Lore= nzatto <analialorenzatto@gmail.com> wrote:

Hello Michael,

Thanks for responding!

I do not have snapshots on any node of the cluster.

<= /p>

Saludos / Regards.

Anal=C3=ADa Lorenzatto.

"Hapiness is not something really made. It comes= from your own actions" by Dalai Lama

=C2=A0=C2=A0=C2=A0

On 19 Aug 2015 6:19 pm, "Laing, Michael&quo= t; <micha= el.laing@nytimes.com> wrote:
Possibly you have snapshots? If so, use no= detool to clear them.

On Wed, Aug 19, 2015 at 4:54 PM, Analia Lorenzatto <ana= lialorenzatto@gmail.com> wrote:
Hello guys,=C2=A0

I have a cassand= ra cluster 2.1 comprised of 4 nodes.

I removed a l= ot of data in a Column Family, then I ran manually a compaction on this Col= umn family on every node. =C2=A0 After doing that, If I query that data, ca= ssandra correctly says this data is not there.=C2=A0 But the space on disk = is exactly the same before removing that data.

Als= o, I realized that =C2=A0gc_grace_seconds =3D 0.=C2=A0 Some people on the i= nternet say that it could produce zombie data, what do you think?

I do not have a TTL defined on the Column family, and = I do not have the possibility to create it. =C2=A0 So my questions is, give= n that I do not have a TTL defined is data going to be removed? =C2=A0or th= e deleted data is never actually going to be deleted due to I do not have a= TTL?


Thanks in advance!

--
Saludos / Regards.

=
Anal=C3=ADa Lorenzatto.

=E2=80=9CIt's possible to commit no= errors and still lose. That is not weakness.=C2=A0 That is life".=C2= =A0 By Captain Jean-Luc Picard.





--
Saludos / Regards.

Anal=C3=ADa Lorenzatto.
=
=E2=80=9CIt's possible to commit no errors and still lose. That is = not weakness.=C2=A0 That is life".=C2=A0 By Captain Jean-Luc Picard.


--001a113edd1c7faae1051dd5a204--