Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5E159200C72 for ; Fri, 12 May 2017 19:57:20 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 5CB6D160BB8; Fri, 12 May 2017 17:57:20 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2C731160BA8 for ; Fri, 12 May 2017 19:57:19 +0200 (CEST) Received: (qmail 36255 invoked by uid 500); 12 May 2017 17:57:17 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 36245 invoked by uid 99); 12 May 2017 17:57:17 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 May 2017 17:57:17 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 371DD181946 for ; Fri, 12 May 2017 17:57:17 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.402 X-Spam-Level: X-Spam-Status: No, score=-0.402 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=apple.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id c7cCNTp17L8z for ; Fri, 12 May 2017 17:57:13 +0000 (UTC) Received: from mail-in24.apple.com (mail-out24.apple.com [17.171.2.34]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 876AE5F4EE for ; Fri, 12 May 2017 17:57:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; d=apple.com; s=mailout2048s; c=relaxed/simple; q=dns/txt; i=@apple.com; t=1494611824; h=From:Sender:Reply-To:Subject:Date:Message-id:To:Cc:MIME-version:Content-type: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-reply-to:References:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=mOdyCGxgHt2pZiUDyC8qmNVt8+FFuIAP6HC59xgmgp8=; b=2vXN/fl6snFZXRJhibhyrmRLlkhQyUNb7fMK1PAoyGeXfLen0YxRG7Mq39X5AgzV L/F/K82LAbWmfuuQprQM2ZN9bQX5rOLOL3CkU94XUlcKKU/MgxvJ+i46BjaUOSZn Gsakb5kiOjZ8fPXQV2XtxiFDunXEI7FJaYT2mkaiJAQmnknSoShLOEPMOoKpQSmb T5kZYFGSh+xcWOXjGvVlq7fSmB5q1GBVETG/yTau2FsmlEITPZqMGu/trWw8qNAz SougmPlK5gta1GsHgMFV3yetB5RE97NawZ0XrvjAVLArM9fKTnGI2buSTi0XdTJ/ k0XheCkdsKSmY8q/ec/3Fg==; Received: from relay4.apple.com (relay4.apple.com [17.128.113.87]) (using TLS with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mail-in24.apple.com (Apple Secure Mail Relay) with SMTP id 8D.0B.20460.F67F5195; Fri, 12 May 2017 10:57:04 -0700 (PDT) X-AuditID: 11ab0218-301ff70000004fec-32-5915f76ed9e7 Received: from jimbu (jimbu.apple.com [17.151.62.37]) by relay4.apple.com (Apple SCV relay) with SMTP id 2C.2B.02523.E67F5195; Fri, 12 May 2017 10:57:02 -0700 (PDT) MIME-version: 1.0 Content-type: multipart/alternative; boundary="Boundary_(ID_EhdT58HpPziV/XaFcn+6Hw)" Received: from Blakes-MBP.mail (unknown [17.153.39.75]) by jimbu.apple.com (Oracle Communications Messaging Server 8.0.1.2.20170210 64bit (built Feb 10 2017)) with ESMTPSA id <0OPU00BCLPUPL480@jimbu.apple.com>; Fri, 12 May 2017 10:57:01 -0700 (PDT) Sender: beggleston@apple.com Date: Fri, 12 May 2017 10:56:48 -0700 From: Blake Eggleston To: Stefano Ortolani , user@cassandra.apache.org Message-id: In-reply-to: References: Subject: Re: LCS, range tombstones, and eviction X-Mailer: Airmail (420) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrALMWRmVeSWpSXmKPExsUi2FAYrlvwXTTSoPuQosWjX/eYLc69+8fs wOTx7do3do+ds+6yBzBFcdmkpOZklqUW6dslcGU0fl/JVHCgouLIv5WMDYxL07oYOTgkBEwk du/S6mLk4hASWMMk8XvtKlaY+ONvPBDxZYwSHxa1sHUxcnLwCghK/Jh8jwXEZhYIk9hx8gwT RNE/Ron7p1+zgiSEBWQkOq5vAStiEVCVmHW2nRnEZhPQkZjydjUTiC0i4CKxZPcydoihxhJb zq4Hq+EUsJP4+rWfHWLoBqChL84wQww1kLi15DZYs4SAuMSFPX/ZQIokBA6wSbyYOZl9AqPg LCQXzkJy4Sygj5gF1CWmTMmFCGtLPHl3gRXClpZ49HcGO4StKfHs3id2XGoWMHKsYhTOTczM 0c3MMzLRSywoyEnVS87P3cQIipLVTBI7GL+8NjzEKMDBqMTDq7BWNFKINbGsuDL3EKM0B4uS OO9fLqCQQHpiSWp2ampBalF8UWlOavEhRiYOTqkGRsGLTW5PujOn7fhhseDyWWG98omui3Zl Led5+8q2c0Johryl91/xBKcNm+MX2qXu5HJwXbY2cudFTU9tjm8ztxX7MazY1PHkcdq8t18O CMxNEjpzNn0bY3ELl8xCTfmqFEGvii0hN+cYNE98bXT3t7XmUYH8i7ILnD6fEdY6pGY939vm Sey3j0osxRmJhlrMRcWJAMt62mFzAgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrFIsWRmVeSWpSXmKPExsUiON1OVTfvu2ikwdq3AhaPft1jtjj37h+z A5PHt2vf2D12zrrLHsAUxWWTkpqTWZZapG+XwJXR+H0lU8GBiooj/1YyNjAuTeti5OCQEDCR ePyNp4uRi0NIYBmjxIdFLWxdjJwcvAKCEj8m32MBsZkFwiR2nDzDBFH0j1Hi/unXrCAJYQEZ iY7rW8CKWARUJWadbWcGsdkEdCSmvF3NBGKLCLhILNm9jB1iqLHElrPrwWo4Bewkvn7tZ4cY ugFo6IszzBBDDSRuLbkN1iwhIC5xYc9ftgmMfLOQHDULyVGzgJ5gFlCXmDIlFyKsLfHk3QVW CFta4tHfGewQtqbEs3uf2HGpWcDIsYpRoCg1J7HSRC+xoCAnVS85P3cTIyioGwrDdzD+W2Z1 iFGAg1GJh7divWikEGtiWXFl7iFGCQ5mJRFe729AId6UxMqq1KL8+KLSnNTiQ4wTGYGhMpFZ SjQ5HxhzeSXxhiYmBibGxmbGxuYm5rQUVhLn7X0jEikkkJ5YkpqdmlqQWgRzFBMHp1QD47mw N0d0l05Ojfvudb76qK5vyiJNjx0MZr9fWQfXxIZd/rDwXKaKapGiyeL+4zxX6mVczX5e6rp6 Y/sOId0Lj60Yrv72uaPX9dDo4aaYMy4Ljq9fLXvlmleu/6PQu99r+meycmnvzWQVZ9/Uv+rE /91d5kLmAm7Nl9bubIt0/n4sd7/j0p+m2kosxRmJhlrMRcWJAGTKucDdAgAA archived-at: Fri, 12 May 2017 17:57:20 -0000 --Boundary_(ID_EhdT58HpPziV/XaFcn+6Hw) Content-type: text/plain; charset=utf-8 Content-transfer-encoding: quoted-printable Content-disposition: inline The start and end points of a range tombstone are basically stored as spe= cial purpose rows alongside the normal data in an sstable. As part of a r= ead, they're reconciled with the data from the other sstables into a sing= le partition, just like the other rows. The only difference is that they = don't contain any 'real' data, and, of course, they prevent 'deleted' dat= a from being returned in the read. It's a bit more complicated than that,= but that's the general idea. On May 12, 2017 at 6:23:01 AM, Stefano Ortolani (ostefano=40gmail.com) wr= ote: Thanks a lot Blake, that definitely helps=21 I actually found a ticket re range tombstones and how they are accounted = for:=C2=A0https://issues.apache.org/jira/browse/CASSANDRA-8527 I am wondering now what happens when a node receives a read request. Are = the range tombstones read before scanning the SStables=3F More interestin= gly, given that a single partition might be split across different levels= , and that some range tombstones might be in L0 while all the rest of the= data in L1, are all the tombstones prefetched from =5Fall=5F the involve= d SStables before doing any table scan=3F Regards, Stefano On Thu, May 11, 2017 at 7:58 PM, Blake Eggleston = wrote: Hi Stefano, Based on what I understood reading the docs, if the ratio of garbage=C2=A0= collectable tomstones exceeds the =22tombstone=5Fthreshold=22, C* should = start=C2=A0 compacting and evicting. If there are no other normal compaction tasks to be run, LCS will attempt= to compact the sstables it estimates it will be able to drop the most to= mbstones from. It does this by estimating the number of tombstones an sst= able has that have passed the gc grace period. Whether or not a tombstone= will actually be evicted is more complicated. Even if a tombstone has pa= ssed gc grace, it can't be dropped if the data it's deleting still exists= in another sstable, otherwise the data would appear to return. So, a tom= bstone won't be dropped if there is data for the same partition in other = sstables that is older than the tombstone being evaluated for eviction. I am quite puzzled however by what might happen when dealing with range=C2= =A0 tombstones. In that case a single tombstone might actually stand for an=C2= =A0 arbitrary number of normal tombstones. In other words, do range tombstone= s=C2=A0 contribute to the =22tombstone=5Fthreshold=22=3F If so, how=3F =46rom what I can tell, each end of the range tombstone is counted as a s= ingle tombstone tombstone. So a range tombstone effectively contributes '= 2' to the count of tombstones for an sstable. I'm not 100% sure, but I ha= ven't seen any sstable writing logic that tracks open tombstones and coun= ts covered cells as tombstones. So, it's likely that the effect of range = tombstones covering many rows are under represented in the droppable tomb= stone estimate. I am also a bit confused by the =22tombstone=5Fcompaction=5Finterval=22. = If I am=C2=A0 dealing with a big partition in LCS which is receiving new records every = day,=C2=A0 and a weekly incremental repair job continously anticompacting the data a= nd=C2=A0 thus creating SStables, what is the likelhood of the default interval=C2=A0= (10 days) to be actually hit=3F It will be hit, but probably only in the repaired data. Once the data is = marked repaired, it shouldn't be anticompacted again, and should get old = enough to pass the compaction interval. That shouldn't be an issue though= , because you should be running repair often enough that data is repaired= before it can ever get past the gc grace period. Otherwise you'll have o= ther problems. Also, keep in mind that tombstone eviction is a part of al= l compactions, it's just that occasionally a compaction is run specifical= ly for that purpose. =46inally, you probably shouldn't run incremental re= pair on data that is deleted. There is a design flaw in the incremental r= epair used in pre-4.0 of cassandra that can cause consistency issues. It = can also cause a *lot* of over streaming, so you might want to take a loo= k at how much streaming your cluster is doing with full repairs, and incr= emental repairs. It might actually be more efficient to run full repairs.= Hope that helps, Blake On May 11, 2017 at 7:16:26 AM, Stefano Ortolani (ostefano=40gmail.com) wr= ote: Hi all, I am trying to wrap my head around how C* evicts tombstones when using LC= S. Based on what I understood reading the docs, if the ratio of garbage=C2=A0= collectable tomstones exceeds the =22tombstone=5Fthreshold=22, C* should = start=C2=A0 compacting and evicting. I am quite puzzled however by what might happen when dealing with range=C2= =A0 tombstones. In that case a single tombstone might actually stand for an=C2= =A0 arbitrary number of normal tombstones. In other words, do range tombstone= s=C2=A0 contribute to the =22tombstone=5Fthreshold=22=3F If so, how=3F I am also a bit confused by the =22tombstone=5Fcompaction=5Finterval=22. = If I am=C2=A0 dealing with a big partition in LCS which is receiving new records every = day,=C2=A0 and a weekly incremental repair job continously anticompacting the data a= nd=C2=A0 thus creating SStables, what is the likelhood of the default interval=C2=A0= (10 days) to be actually hit=3F Hopefully somebody will be able to shed some lights here=21 Thanks in advance=21=C2=A0 Stefano=C2=A0 --Boundary_(ID_EhdT58HpPziV/XaFcn+6Hw) Content-type: text/html; CHARSET=US-ASCII Content-transfer-encoding: quoted-printable Content-disposition: inline