Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6AF01C1B6 for ; Thu, 13 Nov 2014 12:29:34 +0000 (UTC) Received: (qmail 96499 invoked by uid 500); 13 Nov 2014 12:29:34 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 96464 invoked by uid 500); 13 Nov 2014 12:29:34 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 96453 invoked by uid 99); 13 Nov 2014 12:29:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Nov 2014 12:29:34 +0000 Date: Thu, 13 Nov 2014 12:29:34 +0000 (UTC) From: "Marcus Eriksson (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-8243) DTCS can leave time-overlaps, limiting ability to expire entire SSTables MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-8243?page=3Dcom.atla= ssian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-8243: --------------------------------------- Reviewer: Sylvain Lebresne (was: Marcus Eriksson) > DTCS can leave time-overlaps, limiting ability to expire entire SSTables > ------------------------------------------------------------------------ > > Key: CASSANDRA-8243 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8243 > Project: Cassandra > Issue Type: Bug > Reporter: Bj=C3=B6rn Hegerfors > Assignee: Bj=C3=B6rn Hegerfors > Priority: Minor > Labels: compaction, performance > Fix For: 2.0.12, 2.1.3 > > Attachments: cassandra-trunk-CASSANDRA-8243-aggressiveTTLExpiry.t= xt > > > CASSANDRA-6602 (DTCS) and CASSANDRA-5228 are supposed to be a perfect mat= ch for tables where every value is written with a TTL. DTCS makes sure to k= eep old data separate from new data. So shortly after the TTL has passed, C= assandra should be able to throw away the whole SSTable containing a given = data point. > CASSANDRA-5228 deletes the very oldest SSTables, and only if they don't o= verlap (in terms of timestamps) with another SSTable which cannot be delete= d. > DTCS however, can't guarantee that SSTables won't overlap (again, in term= s of timestamps). In a test that I ran, every single SSTable overlapped wit= h its nearest neighbors by a very tiny amount. My reasoning for why this co= uld happen is that the dumped memtables were already overlapping from the s= tart. DTCS will never create an overlap where there is none. I surmised tha= t this happened in my case because I sent parallel writes which must have c= ome out of order. This was just locally, and out of order writes should be = much more common non-locally. > That means that the SSTable removal optimization may never get a chance t= o kick in! > I can see two solutions: > 1. Make DTCS split SSTables on time window borders. This will essentially= only be done on a newly dumped memtable once every base_time_seconds. > 2. Make TTL SSTable expiry more aggressive. Relax the conditions on which= an SSTable can be dropped completely, of course without affecting any sema= ntics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)