Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5C26819828 for ; Tue, 29 Mar 2016 19:21:26 +0000 (UTC) Received: (qmail 96352 invoked by uid 500); 29 Mar 2016 19:21:26 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 96322 invoked by uid 500); 29 Mar 2016 19:21:26 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 96184 invoked by uid 99); 29 Mar 2016 19:21:26 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Mar 2016 19:21:26 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id F2B992C1F6B for ; Tue, 29 Mar 2016 19:21:25 +0000 (UTC) Date: Tue, 29 Mar 2016 19:21:25 +0000 (UTC) From: "Jonathan Shook (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-9666) Provide an alternative to DTCS MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-9666?page=3Dcom.atlas= sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D= 15216688#comment-15216688 ]=20 Jonathan Shook commented on CASSANDRA-9666: ------------------------------------------- There are two areas of concern that we should discuss more directly.. 1. The pacing of memtable flushing on a given system can be matched up with= the base window size with DTCS, avoiding logical write amplification that = can occur before the scheduling discipline kicks in. This is not so easy wh= en you water down the configuration and remove the ability to manage the f= resh sstables. The benefits from time-series friendly compaction can be had= for both the newest and the oldest tables, and both are relevant here. 2. The window placement. From what I've seen, the anchoring point for wheth= er a cell goes into a bucket or not is different between the two approaches= . To me this is fairly arbitrary in terms of processing overhead comparison= s, all else assumed close enough. However, when trying to reconcile, shifti= ng all of your data to a different bucket will not be a welcome event for m= ost users. This makes "graceful" reconciliation difficult at best. Can we simply try to make DTCS as (perceptually) easy to use for the defaul= t case as TWCS (perceptually) ? To me, this is more about the user entry po= int and understanding behavior as designed than it is about the machinery t= hat makes it happen. The basic design between them has so much in common that reconciling them c= ompletely would be mostly a shell game of parameter names as well as lobbin= g off some functionality that can be complete bypassed, given the right set= tings. Can we identify the functionally equivalent settings for TWCS that DTCS nee= ds to emulate, given proper settings (possibly including anchoring point), = and then simply provide the same simple configuration to users, without hav= ing to maintain two separate sibling compaction strategies? One sticking point that I've had on this suggesting in conversation is the = bucketing logic being too difficult to think about. If we were able to prov= ide the self-same behavior for TWCS-like configuration, the bucketing logic= could be used only when the parameters require non-uniform windows. Would = that make everyone happy? > Provide an alternative to DTCS > ------------------------------ > > Key: CASSANDRA-9666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9666 > Project: Cassandra > Issue Type: Improvement > Reporter: Jeff Jirsa > Assignee: Jeff Jirsa > Fix For: 2.1.x, 2.2.x > > Attachments: dtcs-twcs-io.png, dtcs-twcs-load.png > > > DTCS is great for time series data, but it comes with caveats that make i= t difficult to use in production (typical operator behaviors such as bootst= rap, removenode, and repair have MAJOR caveats as they relate to max_sstabl= e_age_days, and hints/read repair break the selection algorithm). > I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrific= es the tiered nature of DTCS in order to address some of DTCS' operational = shortcomings. I believe it is necessary to propose an alternative rather th= an simply adjusting DTCS, because it fundamentally removes the tiered natur= e in order to remove the parameter max_sstable_age_days - the result is ver= y very different, even if it is heavily inspired by DTCS.=20 > Specifically, rather than creating a number of windows of ever increasing= sizes, this strategy allows an operator to choose the window size, compact= with STCS within the first window of that size, and aggressive compact dow= n to a single sstable once that window is no longer current. The window siz= e is a combination of unit (minutes, hours, days) and size (1, etc), such t= hat an operator can expect all data using a block of that size to be compac= ted together (that is, if your unit is hours, and size is 6, you will creat= e roughly 4 sstables per day, each one containing roughly 6 hours of data).= =20 > The result addresses a number of the problems with DateTieredCompactionSt= rategy: > - At the present time, DTCS=E2=80=99s first window is compacted using an = unusual selection criteria, which prefers files with earlier timestamps, bu= t ignores sizes. In TimeWindowCompactionStrategy, the first window data wil= l be compacted with the well tested, fast, reliable STCS. All STCS options = can be passed to TimeWindowCompactionStrategy to configure the first window= =E2=80=99s compaction behavior. > - HintedHandoff may put old data in new sstables, but it will have little= impact other than slightly reduced efficiency (sstables will cover a wider= range, but the old timestamps will not impact sstable selection criteria d= uring compaction) > - ReadRepair may put old data in new sstables, but it will have little im= pact other than slightly reduced efficiency (sstables will cover a wider ra= nge, but the old timestamps will not impact sstable selection criteria duri= ng compaction) > - Small, old sstables resulting from streams of any kind will be swiftly = and aggressively compacted with the other sstables matching their similar m= axTimestamp, without causing sstables in neighboring windows to grow in siz= e. > - The configuration options are explicit and straightforward - the tuning= parameters leave little room for error. The window is set in common, easil= y understandable terms such as =E2=80=9C12 hours=E2=80=9D, =E2=80=9C1 Day= =E2=80=9D, =E2=80=9C30 days=E2=80=9D. The minute/hour/day options are granu= lar enough for users keeping data for hours, and users keeping data for yea= rs.=20 > - There is no explicitly configurable max sstable age, though sstables wi= ll naturally stop compacting once new data is written in that window.=20 > - Streaming operations can create sstables with old timestamps, and they'= ll naturally be joined together with sstables in the same time bucket. This= is true for bootstrap/repair/sstableloader/removenode.=20 > - It remains true that if old data and new data is written into the memta= ble at the same time, the resulting sstables will be treated as if they wer= e new sstables, however, that no longer negatively impacts the compaction s= trategy=E2=80=99s selection criteria for older windows.=20 > Patch provided for :=20 > - 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1=20 > - 2.2: https://github.com/jeffjirsa/cassandra/commits/twcs-2.2 > - trunk (post-8099): https://github.com/jeffjirsa/cassandra/commits/twcs= =20 > Rebased, force-pushed July 18, with bug fixes for estimated pending compa= ctions and potential starvation if more than min_threshold tables existed i= n current window but STCS did not consider them viable candidates > Rebased, force-pushed Aug 20 to bring in relevant logic from CASSANDRA-98= 82 -- This message was sent by Atlassian JIRA (v6.3.4#6332)