Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A78A7C7DC for ; Fri, 21 Jun 2013 16:16:22 +0000 (UTC) Received: (qmail 18197 invoked by uid 500); 21 Jun 2013 16:16:21 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 18129 invoked by uid 500); 21 Jun 2013 16:16:21 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 18108 invoked by uid 99); 21 Jun 2013 16:16:21 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Jun 2013 16:16:21 +0000 Date: Fri, 21 Jun 2013 16:16:21 +0000 (UTC) From: "Edward Capriolo (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-5561) Compaction strategy that minimizes re-compaction of old/frozen data MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-5561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13690438#comment-13690438 ] Edward Capriolo commented on CASSANDRA-5561: -------------------------------------------- {quote} How is this supposed to give better results than simply not compacting sstables that we don't need to read anymore, as suggested in CASSANDRA-5515? {quote} That approach could work, however I think just having a simple (and dumb) fixed time bucketing approach would solve the problem. The way it currently works, is for sized tiered minCompactionThreshold or maxCompactionThreshold kicks in and it mashes together "old" and "new" sstables then the entire situation becomes unmanageable. Eventually everything gets compacted together is some unmanageable way and then once you hit 80% disk usage you fall over a cliff and there is no way out. > Compaction strategy that minimizes re-compaction of old/frozen data > ------------------------------------------------------------------- > > Key: CASSANDRA-5561 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5561 > Project: Cassandra > Issue Type: Improvement > Components: Core > Affects Versions: 1.2.3 > Reporter: Tupshin Harper > Fix For: 2.0.1 > > > Neither LCS nor STCS are good for data that becomes immutable over time. The most obvious case is for time-series data where the application can guarantee that out-of-order delivery (to Cassandra) of events can't take place more than N minutes/seconds/hours/days have elapsed after the real (wall time). > There are various approaches that could involve paying attention to the row keys (if they include a time component) and/or the column name (if they are TimeUUID or Integer based and are inherently time-ordered), but it might be sufficient to just look at the timestamp of the columns themselves. > A possible approach: > 1) Define an optional max-out-of-order window on a per-CF basis. > 2) Use normal (LCS or STCS) compaction strategy for any SSTables that include any columns younger than max-out-of-order-delivery). > 3) Use alternate compaction strategy (will call it TWCS time window compaction strategy for now) for any SSTables that only contain columns older than max-out-of-order-delivery. > 4) TWCS will only compact sstables containing data older than max-out-of-order-delivery. > 5) TWCS will only perform compaction to reduce row fragmentation (if there is any by the time it gets to TWCS or to reduce the number of small sstables. > 6) To minimize re-compaction in TWCS, it should aggresively try to compact as many small sstables as possible into one large sstable that would never have to get recompacted. > In the case of large datasets (e.g. 5TB per node) with LCS, there would be on the order of seven levels, and hence seven separate writes of the same data over time. With this approach, it should be possible to get about 3 compactions per column (2 in original compaction and one more once hitting TWCS) in most cases, cutting the write workload by a factor of two or more for high volume time-series applications. > Note that the only workaround I can currently suggest to minimize compaction for these workloads is to programatically shard your data across time-window ranges (e.g. new CF per week), but that pushes unnecessary writing and querying logic out to the user and is not as convenient nor flexible. > Also note that I am not convinced that the approach I've suggested above is the best/most general way to solve the problem, but it does appear to be a relatively easy one to implement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira