Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3D15D12ABF for ; Sun, 11 May 2014 00:58:38 +0000 (UTC) Received: (qmail 45149 invoked by uid 500); 10 May 2014 23:29:35 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 188 invoked by uid 500); 10 May 2014 23:19:16 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 84102 invoked by uid 99); 10 May 2014 23:02:22 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 May 2014 23:02:22 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of paulo.motta@chaordicsystems.com designates 209.85.213.53 as permitted sender) Received: from [209.85.213.53] (HELO mail-yh0-f53.google.com) (209.85.213.53) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 May 2014 18:07:54 +0000 Received: by mail-yh0-f53.google.com with SMTP id i57so426487yha.40 for ; Thu, 08 May 2014 11:07:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=n3v6rYXoeUUB1JuIj3ZGzTpS8/bGbBIfgOYjHo0IG4A=; b=DwabIOR1/MJtIE6tz7P93ofFk+2yB1B/mQBURcYqAeVKY/5YMlJ4jm/YSVmGC9BTsz v5TBMKQBZGXspe58nj/c2G6eISBLGbHMm9ey1u4jTbwnQ4MGl7/E+9pwqIyIuAgU3ucu atClmi9IJoj7qSDCinOjnFohmNxDluuX+0gqWJNTlVDPO9fXbq98UYvGm9Tm3skIDUZY OVUlQI3QvqqX8lMlDc3Ub2QL7kf7NWdXxJ/hqQonHYWOnv9KulAe/Px+F+5A/xmLpnnB HFI9SpPq8Ec2JYq2ctTUSFuJ3FbQ2NkgcPcFX4Ci5QV+23iu976glvLRNGVSGJ4VBEYr e2VA== X-Gm-Message-State: ALoCoQl0w0BCsiK2cEM1AhMn67xU71CeSw2KtJEAjInx7BRqS+7w9cExFEQ9HdFEQ60LmwS0RE4J X-Received: by 10.236.90.225 with SMTP id e61mr7195580yhf.15.1399572450543; Thu, 08 May 2014 11:07:30 -0700 (PDT) MIME-Version: 1.0 Received: by 10.170.168.87 with HTTP; Thu, 8 May 2014 11:07:10 -0700 (PDT) In-Reply-To: References: From: Paulo Ricardo Motta Gomes Date: Thu, 8 May 2014 15:07:10 -0300 Message-ID: Subject: Re: Automatic tombstone removal issue (STCS) To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=20cf3005e0f0c097ec04f8e75a17 X-Virus-Checked: Checked by ClamAV on apache.org --20cf3005e0f0c097ec04f8e75a17 Content-Type: text/plain; charset=UTF-8 I just updated CASSANDRA-6563 with more details and proposed a patch to solve the issue, in case anyone else is interested. https://issues.apache.org/jira/browse/CASSANDRA-6563 On Tue, May 6, 2014 at 10:00 PM, Paulo Ricardo Motta Gomes < paulo.motta@chaordicsystems.com> wrote: > Robert: thanks for the support, you are right, this belonged more to the > dev list but I didn't think of it. > > Yuki: thanks a lot for the clarification, this is what I suspected. > > I understand it's costly to check row by row overlap in order to decide if > a SSTable is candidate for compaction, but doesn't the compaction process > already performs this check when removing tombstones? So, couldn't this > check be dropped during decision time and let the compaction run anyway? > > This optimization is specially interesting with large STCS sstables, where > the token range will very likely overlap with all other sstables, so it's a > pity it's almost never being triggered in these cases. > > On Tue, May 6, 2014 at 9:32 PM, Yuki Morishita wrote: > >> Hi Paulo, >> >> The reason we check overlap is not to resurrect deleted data by only >> dropping tombstone marker from single SSTable. >> And we don't want to check row by row to determine if SSTable is >> droppable since it takes time, so we use token ranges to determine if >> it MAY have droppable columns. >> >> On Tue, May 6, 2014 at 7:14 PM, Paulo Ricardo Motta Gomes >> wrote: >> > Hello, >> > >> > Sorry for being persistent, but I'd love to clear my understanding on >> this. >> > Has anyone seen single sstable compaction being triggered for STCS >> sstables >> > with high tombstone ratio? >> > >> > Because if the above understanding is correct, the current >> implementation >> > almost never triggers this kind of compaction, since the token ranges >> of a >> > node's sstable almost always overlap. Could this be a bug or is it >> expected >> > behavior? >> > >> > Thank you, >> > >> > >> > >> > On Mon, May 5, 2014 at 8:59 AM, Paulo Ricardo Motta Gomes >> > wrote: >> >> >> >> Hello, >> >> >> >> After noticing that automatic tombstone removal (CASSANDRA-3442) was >> not >> >> working in an append-only STCS CF with 40% of droppable tombstone >> ratio I >> >> investigated why the compaction was not being triggered in the largest >> >> SSTable with 16GB and about 70% droppable tombstone ratio. >> >> >> >> When the code goes to check if the SSTable is candidate to be compacted >> >> (AbstractCompactionStrategy.worthDroppingTombstones), it verifies if >> all the >> >> others SSTables overlap with the current SSTable by checking if the >> start >> >> and end tokens overlap. The problem is that all SSTables contain >> pretty much >> >> the whole node token range, so all of them overlap nearly all the >> time, so >> >> the automatic tombstone removal never happens. Is there any case in >> STCS >> >> where all sstables token ranges DO NOT overlap? >> >> >> >> I understand during the tombstone removal process it's necessary to >> verify >> >> if the compacted row exists in any other SSTable, but I don't >> understand why >> >> it's necessary to verify if the token ranges overlap to decide if a >> >> tombstone compaction must be executed on a single SSTable with high >> >> droppable tombstone ratio. >> >> >> >> Any clarification would be kindly appreciated. >> >> >> >> PS: Cassandra version: 1.2.16 >> >> >> >> -- >> >> Paulo Motta >> >> >> >> Chaordic | Platform >> >> www.chaordic.com.br >> >> +55 48 3232.3200 >> > >> > >> > >> > >> > -- >> > Paulo Motta >> > >> > Chaordic | Platform >> > www.chaordic.com.br >> > +55 48 3232.3200 >> >> >> >> -- >> Yuki Morishita >> t:yukim (http://twitter.com/yukim) >> > > > > -- > *Paulo Motta* > > Chaordic | *Platform* > *www.chaordic.com.br * > +55 48 3232.3200 > -- *Paulo Motta* Chaordic | *Platform* *www.chaordic.com.br * +55 48 3232.3200 --20cf3005e0f0c097ec04f8e75a17 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I just updated=C2=A0CASSANDRA-6563 with more details and p= roposed a patch to solve the issue, in case anyone else is interested.
=
https://issues.apache.org/jira/browse/CASSANDRA-6563

On Tue, May 6, 2014 at 10:00 PM, Paulo Ricardo Motta Gomes <= paulo.motta@chaordicsystems.com> wrote:
Robert: thanks for the support, you are right, this belong= ed more to the dev list but I didn't think of it.

Yu= ki: thanks a lot for the clarification, this is what I suspected.

I understand it's costly to check row by row overlap in = order to decide if a SSTable is candidate for compaction, but doesn't t= he compaction process already performs this check when removing tombstones?= So, couldn't this check be dropped during decision time and let the co= mpaction run anyway?

This optimization is specially interesting with large S= TCS sstables, where the token range will very likely overlap with all other= sstables, so it's a pity it's almost never being triggered in thes= e cases.

On Tue, May 6, 20= 14 at 9:32 PM, Yuki Morishita <mor.yuki@gmail.com> wrote:
Hi Paulo,

The reason we check overlap is not to resurrect deleted data by only
dropping tombstone marker from single SSTable.
And we don't want to check row by row to determine if SSTable is
droppable since it takes time, so we use token ranges to determine if
it MAY have droppable columns.

On Tue, May 6, 2014 at 7:14 PM, Paulo Ricardo Motta Gomes
<paulo.motta@chaordicsystems.com> wrote:
> Hello,
>
> Sorry for being persistent, but I'd love to clear my understanding= on this.
> Has anyone seen single sstable compaction being triggered for STCS sst= ables
> with high tombstone ratio?
>
> Because if the above understanding is correct, the current implementat= ion
> almost never triggers this kind of compaction, since the token ranges = of a
> node's sstable almost always overlap. Could this be a bug or is it= expected
> behavior?
>
> Thank you,
>
>
>
> On Mon, May 5, 2014 at 8:59 AM, Paulo Ricardo Motta Gomes
> <paulo.motta@chaordicsystems.com> wrote:
>>
>> Hello,
>>
>> After noticing that automatic tombstone removal (CASSANDRA-3442) w= as not
>> working in an append-only STCS CF with 40% of droppable tombstone = ratio I
>> investigated why the compaction was not being triggered in the lar= gest
>> SSTable with 16GB and about 70% droppable tombstone ratio.
>>
>> When the code goes to check if the SSTable is candidate to be comp= acted
>> (AbstractCompactionStrategy.worthDroppingTombstones), it verifies = if all the
>> others SSTables overlap with the current SSTable by checking if th= e start
>> and end tokens overlap. The problem is that all SSTables contain p= retty much
>> the whole node token range, so all of them overlap nearly all the = time, so
>> the automatic tombstone removal never happens. Is there any case i= n STCS
>> where all sstables token ranges DO NOT overlap?
>>
>> I understand during the tombstone removal process it's necessa= ry to verify
>> if the compacted row exists in any other SSTable, but I don't = understand why
>> it's necessary to verify if the token ranges overlap to decide= if a
>> tombstone compaction must be executed on a single SSTable with hig= h
>> droppable tombstone ratio.
>>
>> Any clarification would be kindly appreciated.
>>
>> PS: Cassandra version: 1.2.16
>>
>> --
>> Paulo Motta
>>
>> Chaordic | Platform
>> www.chaor= dic.com.br
>> +55 48 3232.3200
>
>
>
>
> --
> Paulo Motta
>
> Chaordic | Platform
> www.chaordic.= com.br
> +55 48 3232.3200



--
Yuki Morishita
=C2=A0t:yukim (http:= //twitter.com/yukim)



--
Paul= o Motta



--
Paulo Motta

--20cf3005e0f0c097ec04f8e75a17--