Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 16D31200C72 for ; Fri, 12 May 2017 15:22:58 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 155B3160BB8; Fri, 12 May 2017 13:22:58 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id DA299160BA3 for ; Fri, 12 May 2017 15:22:56 +0200 (CEST) Received: (qmail 58185 invoked by uid 500); 12 May 2017 13:22:53 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 58175 invoked by uid 99); 12 May 2017 13:22:53 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 May 2017 13:22:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 4BC96C0145 for ; Fri, 12 May 2017 13:22:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.379 X-Spam-Level: ** X-Spam-Status: No, score=2.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id npwpd6F2GVJh for ; Fri, 12 May 2017 13:22:51 +0000 (UTC) Received: from mail-ua0-f179.google.com (mail-ua0-f179.google.com [209.85.217.179]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 0262A5F398 for ; Fri, 12 May 2017 13:22:50 +0000 (UTC) Received: by mail-ua0-f179.google.com with SMTP id e55so46553911uaa.2 for ; Fri, 12 May 2017 06:22:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=B9yoD28d1llAQR3LZxw6XMqnzgLztPD2nAR4A1BS8aM=; b=bf+SubMYjtXpsX356J5ridHlm3XVVMLmyknI+yd3JF9yJA4oUdpyo26I/roabMIIv8 AxSK9O5IyaJpRwtYvPAWJdcI2qqaUAp8jHkI1k9iNxfvG/MBruhBRkcu9KmdJG102aHI 1Z83EQW0DlvqpdWIbO51O5fHGfPhb3gTJnB6tsDooIK5XeDgtSefkUKQ1j9rcwBz/+i6 wKjTGP8LSt0NI31nlqHJdUAT3sK79KP7yFQY8aalfI6jtT2+DaAx/KLNVGqh+pdmShrx OOxRNFO+6ySm0buBJx79wU9h0Uamktq4PZuEY9gwJNXHOO3eaDrg8Y2z/Q7YYPKyOAGp R5qg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=B9yoD28d1llAQR3LZxw6XMqnzgLztPD2nAR4A1BS8aM=; b=CXmxf7xOzqIt2e00XVRzvCvuXLybNt+/RQ+e+s9jDxKhzoUC+ojoTJyAnoNlCZANX1 3WCICJjbh78mVaHqeGR+iBe5GmnoGwyh4iKCy5CRQ/jZ6pbrbnlkfwDJQ41yFYV8TonJ eyiXKyPa8NAR75CkdaclFrWjVz2DPomILgEJdlMdcWQSJKOtGXERSChR7fdiTSiIZGC0 dNfPGsJ4LxY93zYmP08Yjn3U/7Yip8UHhXD4bsQaxXcxlvVlpxLFID9cQk0sTl3X9SEE iDg50wb0vCpzz8tZ0EAg89EkYunH9fVer0DZqwFZjemi9JPT/9dl4aBFCaLh77xoZZs9 5rvA== X-Gm-Message-State: AODbwcDz0hhLVtgllCyUyi8PNOxfBxrIwcT8Vnr+qoBfDRyrey5GqtEj 1OQxSVkVwK2e2Xua2H6erOJTA3HbISVC X-Received: by 10.159.35.22 with SMTP id 22mr2070896uae.134.1494595370444; Fri, 12 May 2017 06:22:50 -0700 (PDT) MIME-Version: 1.0 Received: by 10.103.80.23 with HTTP; Fri, 12 May 2017 06:22:30 -0700 (PDT) In-Reply-To: References: From: Stefano Ortolani Date: Fri, 12 May 2017 14:22:30 +0100 Message-ID: Subject: Re: LCS, range tombstones, and eviction To: Blake Eggleston Cc: user@cassandra.apache.org Content-Type: multipart/alternative; boundary="001a113e07d8237031054f539ac5" archived-at: Fri, 12 May 2017 13:22:58 -0000 --001a113e07d8237031054f539ac5 Content-Type: text/plain; charset="UTF-8" Thanks a lot Blake, that definitely helps! I actually found a ticket re range tombstones and how they are accounted for: https://issues.apache.org/jira/browse/CASSANDRA-8527 I am wondering now what happens when a node receives a read request. Are the range tombstones read before scanning the SStables? More interestingly, given that a single partition might be split across different levels, and that some range tombstones might be in L0 while all the rest of the data in L1, are all the tombstones prefetched from _all_ the involved SStables before doing any table scan? Regards, Stefano On Thu, May 11, 2017 at 7:58 PM, Blake Eggleston wrote: > Hi Stefano, > > Based on what I understood reading the docs, if the ratio of garbage > collectable tomstones exceeds the "tombstone_threshold", C* should start > compacting and evicting. > > > If there are no other normal compaction tasks to be run, LCS will attempt > to compact the sstables it estimates it will be able to drop the most > tombstones from. It does this by estimating the number of tombstones an > sstable has that have passed the gc grace period. Whether or not a > tombstone will actually be evicted is more complicated. Even if a tombstone > has passed gc grace, it can't be dropped if the data it's deleting still > exists in another sstable, otherwise the data would appear to return. So, a > tombstone won't be dropped if there is data for the same partition in other > sstables that is older than the tombstone being evaluated for eviction. > > I am quite puzzled however by what might happen when dealing with range > tombstones. In that case a single tombstone might actually stand for an > arbitrary number of normal tombstones. In other words, do range tombstones > contribute to the "tombstone_threshold"? If so, how? > > > From what I can tell, each end of the range tombstone is counted as a > single tombstone tombstone. So a range tombstone effectively contributes > '2' to the count of tombstones for an sstable. I'm not 100% sure, but I > haven't seen any sstable writing logic that tracks open tombstones and > counts covered cells as tombstones. So, it's likely that the effect of > range tombstones covering many rows are under represented in the droppable > tombstone estimate. > > I am also a bit confused by the "tombstone_compaction_interval". If I am > dealing with a big partition in LCS which is receiving new records every > day, > and a weekly incremental repair job continously anticompacting the data > and > thus creating SStables, what is the likelhood of the default interval > (10 days) to be actually hit? > > > It will be hit, but probably only in the repaired data. Once the data is > marked repaired, it shouldn't be anticompacted again, and should get old > enough to pass the compaction interval. That shouldn't be an issue though, > because you should be running repair often enough that data is repaired > before it can ever get past the gc grace period. Otherwise you'll have > other problems. Also, keep in mind that tombstone eviction is a part of all > compactions, it's just that occasionally a compaction is run specifically > for that purpose. Finally, you probably shouldn't run incremental repair on > data that is deleted. There is a design flaw in the incremental repair used > in pre-4.0 of cassandra that can cause consistency issues. It can also > cause a *lot* of over streaming, so you might want to take a look at how > much streaming your cluster is doing with full repairs, and incremental > repairs. It might actually be more efficient to run full repairs. > > Hope that helps, > > Blake > > On May 11, 2017 at 7:16:26 AM, Stefano Ortolani (ostefano@gmail.com) > wrote: > > Hi all, > > I am trying to wrap my head around how C* evicts tombstones when using LCS. > Based on what I understood reading the docs, if the ratio of garbage > collectable tomstones exceeds the "tombstone_threshold", C* should start > compacting and evicting. > > I am quite puzzled however by what might happen when dealing with range > tombstones. In that case a single tombstone might actually stand for an > arbitrary number of normal tombstones. In other words, do range tombstones > contribute to the "tombstone_threshold"? If so, how? > > I am also a bit confused by the "tombstone_compaction_interval". If I am > dealing with a big partition in LCS which is receiving new records every > day, > and a weekly incremental repair job continously anticompacting the data > and > thus creating SStables, what is the likelhood of the default interval > (10 days) to be actually hit? > > Hopefully somebody will be able to shed some lights here! > > Thanks in advance! > Stefano > > --001a113e07d8237031054f539ac5 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks a lot Blake, that definitely helps!
=
I actually found a ticket re range tombstones and how they a= re accounted for:=C2=A0https://issues.apache.org/jira/browse/C= ASSANDRA-8527

I am wondering now what happens = when a node receives a read request. Are the range tombstones read before s= canning the SStables? More interestingly, given that a single partition mig= ht be split across different levels, and that some range tombstones might b= e in L0 while all the rest of the data in L1, are all the tombstones prefet= ched from _all_ the involved SStables before doing any table scan?

Regards,
Stefano

On Thu, May 11, 2017 at 7:58 PM, Blak= e Eggleston <beggleston@apple.com> wrote:
Hi Stefano,

Based on what I understood reading= the docs, if the ratio of garbage=C2=A0
collectable tomstones ex= ceeds the "tombstone_threshold", C* should start=C2=A0
= compacting and evicting.

If there are no other norma= l compaction tasks to be run, LCS will attempt to compact the sstables it e= stimates it will be able to drop the most tombstones from. It does this by = estimating the number of tombstones an sstable has that have passed the gc = grace period. Whether or not a tombstone will actually be evicted is more c= omplicated. Even if a tombstone has passed gc grace, it can't be droppe= d if the data it's deleting still exists in another sstable, otherwise = the data would appear to return. So, a tombstone won't be dropped if th= ere is data for the same partition in other sstables that is older than the= tombstone being evaluated for eviction.

I am quite puzzled however by what might happen when dealin= g with range=C2=A0
tombstones. In that case a single tombstone mi= ght actually stand for an=C2=A0
arbitrary number of normal tombst= ones. In other words, do range tombstones=C2=A0
contribute to the= "tombstone_threshold"? If so, how?

From what I can tell, each end of the range tom= bstone is counted as a single tombstone tombstone. So a range tombstone eff= ectively contributes '2' to the count of tombstones for an sstable.= I'm not 100% sure, but I haven't seen any sstable writing logic th= at tracks open tombstones and counts covered cells as tombstones. So, it= 9;s likely that the effect of range tombstones covering many rows are under= represented in the droppable tombstone estimate.

I am also a bit confused by the "tombst= one_compaction_interval". If I am=C2=A0
dealing with a = big partition in LCS which is receiving new records every day,=C2=A0
<= div>and a weekly incremental repair job continously anticompacting the data= and=C2=A0
thus creating SStables, what is the likelhood of the d= efault interval=C2=A0
(10 days) to be actually hit?

It will be hit, but probably on= ly in the repaired data. Once the data is marked repaired, it shouldn't= be anticompacted again, and should get old enough to pass the compaction i= nterval. That shouldn't be an issue though, because you should be runni= ng repair often enough that data is repaired before it can ever get past th= e gc grace period. Otherwise you'll have other problems. Also, keep in = mind that tombstone eviction is a part of all compactions, it's just th= at occasionally a compaction is run specifically for that purpose. Finally,= you probably shouldn't run incremental repair on data that is deleted.= There is a design flaw in the incremental repair used in pre-4.0 of cassan= dra that can cause consistency issues. It can also cause a *lot* of over st= reaming, so you might want to take a look at how much streaming your cluste= r is doing with full repairs, and incremental repairs. It might actually be= more efficient to run full repairs.

Hope that hel= ps,

Blake

On May 11, 2017 at 7:1= 6:26 AM, Stefano Ortolani (ostefano@gmail.com) wrote:

Hi all,

I am trying to wrap my head around how C* evicts tombstones when using LCS.
Based on what I understood reading the docs, if the ratio of garbage=C2=A0
collectable tomstones exceeds the "tombstone_threshold", C* should start=C2=A0
compacting and evicting.

I am quite puzzled however by what might happen when dealing with range=C2=A0
tombstones. In that case a single tombstone might actually stand for an=C2=A0
arbitrary number of normal tombstones. In other words, do range tombstones=C2=A0
contribute to the "tombstone_threshold"? If so, how?

I am also a bit confused by the "tombstone_compaction_interval". If I am=C2=A0
dealing with a big partition in LCS which is receiving new records every day,=C2=A0
and a weekly incremental repair job continously anticompacting the data and=C2=A0
thus creating SStables, what is the likelhood of the default interval=C2=A0
(10 days) to be actually hit?

Hopefully somebody will be able to shed some lights here!

Thanks in advance!=C2=A0
Stefano=C2=A0

<= /blockquote>
--001a113e07d8237031054f539ac5--