Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2C5E52004CA for ; Wed, 11 May 2016 18:58:15 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 2B213160A18; Wed, 11 May 2016 16:58:15 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 73664160A09 for ; Wed, 11 May 2016 18:58:14 +0200 (CEST) Received: (qmail 25501 invoked by uid 500); 11 May 2016 16:58:13 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 25430 invoked by uid 99); 11 May 2016 16:58:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 May 2016 16:58:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 6B6332C033A for ; Wed, 11 May 2016 16:58:13 +0000 (UTC) Date: Wed, 11 May 2016 16:58:13 +0000 (UTC) From: "Jeff Jirsa (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-9666) Provide an alternative to DTCS MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 11 May 2016 16:58:15 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-9666?page=3Dcom.atlas= sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D= 15280425#comment-15280425 ]=20 Jeff Jirsa commented on CASSANDRA-9666: --------------------------------------- I love that you wrote the simulator, because it's REALLY hard to test this = in real life. I also fully agree that TWCS needs a plan before 10496. TWCS as-is with 104= 96 would be painful. I had been thinking about what I'd want to do with TWCS old windows in the = context of CASSANDRA-10496. The technique I was planning on adopting was us= ing STCS in old windows, but morphing the STCS parameters (notably {{min_ss= table_size}} rather than {{bucket_low}} or {{bucket_high}} ) based on eithe= r the age of the sstables / age of the window to EVENTUALLY get back to sin= gle sstable per window. I like your idea of doing the single major first, = flagging it with a boolean in a system table, and then coming back and doin= g STCS on any new data after the fact. That also lets (sufficiently advance= d) users go clear that boolean and force a new major if they really want th= at behavior. > Provide an alternative to DTCS > ------------------------------ > > Key: CASSANDRA-9666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9666 > Project: Cassandra > Issue Type: Improvement > Reporter: Jeff Jirsa > Assignee: Jeff Jirsa > Fix For: 2.1.x, 2.2.x > > Attachments: compactomatic.py, dashboard-DTCS_to_TWCS.png, dtcs-t= wcs-io.png, dtcs-twcs-load.png > > > DTCS is great for time series data, but it comes with caveats that make i= t difficult to use in production (typical operator behaviors such as bootst= rap, removenode, and repair have MAJOR caveats as they relate to max_sstabl= e_age_days, and hints/read repair break the selection algorithm). > I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrific= es the tiered nature of DTCS in order to address some of DTCS' operational = shortcomings. I believe it is necessary to propose an alternative rather th= an simply adjusting DTCS, because it fundamentally removes the tiered natur= e in order to remove the parameter max_sstable_age_days - the result is ver= y very different, even if it is heavily inspired by DTCS.=20 > Specifically, rather than creating a number of windows of ever increasing= sizes, this strategy allows an operator to choose the window size, compact= with STCS within the first window of that size, and aggressive compact dow= n to a single sstable once that window is no longer current. The window siz= e is a combination of unit (minutes, hours, days) and size (1, etc), such t= hat an operator can expect all data using a block of that size to be compac= ted together (that is, if your unit is hours, and size is 6, you will creat= e roughly 4 sstables per day, each one containing roughly 6 hours of data).= =20 > The result addresses a number of the problems with DateTieredCompactionSt= rategy: > - At the present time, DTCS=E2=80=99s first window is compacted using an = unusual selection criteria, which prefers files with earlier timestamps, bu= t ignores sizes. In TimeWindowCompactionStrategy, the first window data wil= l be compacted with the well tested, fast, reliable STCS. All STCS options = can be passed to TimeWindowCompactionStrategy to configure the first window= =E2=80=99s compaction behavior. > - HintedHandoff may put old data in new sstables, but it will have little= impact other than slightly reduced efficiency (sstables will cover a wider= range, but the old timestamps will not impact sstable selection criteria d= uring compaction) > - ReadRepair may put old data in new sstables, but it will have little im= pact other than slightly reduced efficiency (sstables will cover a wider ra= nge, but the old timestamps will not impact sstable selection criteria duri= ng compaction) > - Small, old sstables resulting from streams of any kind will be swiftly = and aggressively compacted with the other sstables matching their similar m= axTimestamp, without causing sstables in neighboring windows to grow in siz= e. > - The configuration options are explicit and straightforward - the tuning= parameters leave little room for error. The window is set in common, easil= y understandable terms such as =E2=80=9C12 hours=E2=80=9D, =E2=80=9C1 Day= =E2=80=9D, =E2=80=9C30 days=E2=80=9D. The minute/hour/day options are granu= lar enough for users keeping data for hours, and users keeping data for yea= rs.=20 > - There is no explicitly configurable max sstable age, though sstables wi= ll naturally stop compacting once new data is written in that window.=20 > - Streaming operations can create sstables with old timestamps, and they'= ll naturally be joined together with sstables in the same time bucket. This= is true for bootstrap/repair/sstableloader/removenode.=20 > - It remains true that if old data and new data is written into the memta= ble at the same time, the resulting sstables will be treated as if they wer= e new sstables, however, that no longer negatively impacts the compaction s= trategy=E2=80=99s selection criteria for older windows.=20 > Patch provided for :=20 > - 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1=20 > - 2.2: https://github.com/jeffjirsa/cassandra/commits/twcs-2.2 > - trunk (post-8099): https://github.com/jeffjirsa/cassandra/commits/twcs= =20 > Rebased, force-pushed July 18, with bug fixes for estimated pending compa= ctions and potential starvation if more than min_threshold tables existed i= n current window but STCS did not consider them viable candidates > Rebased, force-pushed Aug 20 to bring in relevant logic from CASSANDRA-98= 82 -- This message was sent by Atlassian JIRA (v6.3.4#6332)