Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 87536166C09 for ; Tue, 22 Aug 2017 13:37:05 +0200 (CEST) Received: (qmail 36499 invoked by uid 500); 22 Aug 2017 11:37:03 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 36483 invoked by uid 99); 22 Aug 2017 11:37:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Aug 2017 11:37:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id AD66CC0169 for ; Tue, 22 Aug 2017 11:37:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id tlqRJbYTJE_3 for ; Tue, 22 Aug 2017 11:37:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 0F8525FD6F for ; Tue, 22 Aug 2017 11:37:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 8CCCBE028F for ; Tue, 22 Aug 2017 11:37:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 47CB72537F for ; Tue, 22 Aug 2017 11:37:00 +0000 (UTC) Date: Tue, 22 Aug 2017 11:37:00 +0000 (UTC) From: "Romain GERARD (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136652#comment-16136652 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:36 AM: --------------------------------------------------------------------- New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it, just say so {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring it when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} but it is changing me a lot of things. Do you know if it is up to date ? was (Author: rgerard): New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring it when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} but it is changing me a lot of things. Do you know if it is up to date ? > Allow TWCS to ignore overlaps when dropping fully expired sstables > ------------------------------------------------------------------ > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction > Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If you really want read-repairs you're going to have sstables blocking the expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a very low value and that will purge the blockers of old data that should already have expired, thus removing the overlaps and allowing the other SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have time series, you might not care if all your data doesn't exactly expire at the right time, or if data re-appears for some time, as long as it gets deleted as soon as it can. And in this situation I believe it would be really beneficial to allow users to simply ignore overlapping SSTables when looking for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be enough to greatly reduce entropy of the most used data (and if you have timeseries, you're likely to have a dashboard doing the same important queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org For additional commands, e-mail: commits-help@cassandra.apache.org