Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9B28118DDF for ; Thu, 21 Jan 2016 11:51:40 +0000 (UTC) Received: (qmail 63425 invoked by uid 500); 21 Jan 2016 11:51:40 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 63383 invoked by uid 500); 21 Jan 2016 11:51:40 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 63337 invoked by uid 99); 21 Jan 2016 11:51:40 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Jan 2016 11:51:40 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 550FD2C1F5B for ; Thu, 21 Jan 2016 11:51:40 +0000 (UTC) Date: Thu, 21 Jan 2016 11:51:40 +0000 (UTC) From: =?utf-8?Q?Bj=C3=B6rn_Hegerfors_=28JIRA=29?= To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-9666) Provide an alternative to DTCS MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-9666?page=3Dcom.atlas= sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D= 15110474#comment-15110474 ]=20 Bj=C3=B6rn Hegerfors commented on CASSANDRA-9666: -------------------------------------------- Unless I forgot something, the only remaining difference is the use of max_= timestamp instead of min_timestamp (and the tiering before max_window_size = is reached, but I don't see how that can be bad). I never had a strong argu= ment either way for max/min. But [~jjirsa] seems to have well founded reaso= ns for his choice. So I see absolutely no reason why we wouldn't just swap = this in DTCS. WDYT [~krummas]? My stance on the ticket is that I don't see a reason for there to be two st= rategies for the same thing. But I also don't see a reason to go with the o= ne that is less configurable. My takeaway from TWCS is that DTCS clearly ha= sn't had sane defaults, and indeed in some cases hasn't even provided an op= tion to make it behave exactly like TWCS. TWCS may very well be some kind o= f sweet spot, but let's then make that the defaults of a strategy that is n= ot afraid of having some knobs that can be turned by advanced users. > Provide an alternative to DTCS > ------------------------------ > > Key: CASSANDRA-9666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9666 > Project: Cassandra > Issue Type: Improvement > Reporter: Jeff Jirsa > Assignee: Jeff Jirsa > Fix For: 2.1.x, 2.2.x > > Attachments: dtcs-twcs-io.png, dtcs-twcs-load.png > > > DTCS is great for time series data, but it comes with caveats that make i= t difficult to use in production (typical operator behaviors such as bootst= rap, removenode, and repair have MAJOR caveats as they relate to max_sstabl= e_age_days, and hints/read repair break the selection algorithm). > I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrific= es the tiered nature of DTCS in order to address some of DTCS' operational = shortcomings. I believe it is necessary to propose an alternative rather th= an simply adjusting DTCS, because it fundamentally removes the tiered natur= e in order to remove the parameter max_sstable_age_days - the result is ver= y very different, even if it is heavily inspired by DTCS.=20 > Specifically, rather than creating a number of windows of ever increasing= sizes, this strategy allows an operator to choose the window size, compact= with STCS within the first window of that size, and aggressive compact dow= n to a single sstable once that window is no longer current. The window siz= e is a combination of unit (minutes, hours, days) and size (1, etc), such t= hat an operator can expect all data using a block of that size to be compac= ted together (that is, if your unit is hours, and size is 6, you will creat= e roughly 4 sstables per day, each one containing roughly 6 hours of data).= =20 > The result addresses a number of the problems with DateTieredCompactionSt= rategy: > - At the present time, DTCS=E2=80=99s first window is compacted using an = unusual selection criteria, which prefers files with earlier timestamps, bu= t ignores sizes. In TimeWindowCompactionStrategy, the first window data wil= l be compacted with the well tested, fast, reliable STCS. All STCS options = can be passed to TimeWindowCompactionStrategy to configure the first window= =E2=80=99s compaction behavior. > - HintedHandoff may put old data in new sstables, but it will have little= impact other than slightly reduced efficiency (sstables will cover a wider= range, but the old timestamps will not impact sstable selection criteria d= uring compaction) > - ReadRepair may put old data in new sstables, but it will have little im= pact other than slightly reduced efficiency (sstables will cover a wider ra= nge, but the old timestamps will not impact sstable selection criteria duri= ng compaction) > - Small, old sstables resulting from streams of any kind will be swiftly = and aggressively compacted with the other sstables matching their similar m= axTimestamp, without causing sstables in neighboring windows to grow in siz= e. > - The configuration options are explicit and straightforward - the tuning= parameters leave little room for error. The window is set in common, easil= y understandable terms such as =E2=80=9C12 hours=E2=80=9D, =E2=80=9C1 Day= =E2=80=9D, =E2=80=9C30 days=E2=80=9D. The minute/hour/day options are granu= lar enough for users keeping data for hours, and users keeping data for yea= rs.=20 > - There is no explicitly configurable max sstable age, though sstables wi= ll naturally stop compacting once new data is written in that window.=20 > - Streaming operations can create sstables with old timestamps, and they'= ll naturally be joined together with sstables in the same time bucket. This= is true for bootstrap/repair/sstableloader/removenode.=20 > - It remains true that if old data and new data is written into the memta= ble at the same time, the resulting sstables will be treated as if they wer= e new sstables, however, that no longer negatively impacts the compaction s= trategy=E2=80=99s selection criteria for older windows.=20 > Patch provided for :=20 > - 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1=20 > - 2.2: https://github.com/jeffjirsa/cassandra/commits/twcs-2.2 > - trunk (post-8099): https://github.com/jeffjirsa/cassandra/commits/twcs= =20 > Rebased, force-pushed July 18, with bug fixes for estimated pending compa= ctions and potential starvation if more than min_threshold tables existed i= n current window but STCS did not consider them viable candidates > Rebased, force-pushed Aug 20 to bring in relevant logic from CASSANDRA-98= 82 -- This message was sent by Atlassian JIRA (v6.3.4#6332)