Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5282111ED8 for ; Fri, 5 Sep 2014 20:13:29 +0000 (UTC) Received: (qmail 32738 invoked by uid 500); 5 Sep 2014 20:13:29 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 32691 invoked by uid 500); 5 Sep 2014 20:13:29 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 32679 invoked by uid 99); 5 Sep 2014 20:13:29 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Sep 2014 20:13:29 +0000 Date: Fri, 5 Sep 2014 20:13:29 +0000 (UTC) From: "Colin Taylor (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (CASSANDRA-7890) LCS and time series data MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-7890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123490#comment-14123490 ] Colin Taylor edited comment on CASSANDRA-7890 at 9/5/14 8:12 PM: ----------------------------------------------------------------- I had a very similar idea, I called it Partially Ordered Row Keys or PORK Partitioner but yeah didn't seem trivial. was (Author: coltnz): I had a very similar idea, I called it Partially Ordered Row Keys or PORK Partitioner. > LCS and time series data > ------------------------ > > Key: CASSANDRA-7890 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7890 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Dan Hendry > Fix For: 3.0 > > > Consider the following very typical schema for bucketed time series data: > {noformat} > CREATE TABLE user_timeline ( > ts_bucket bigint, > username varchar, > ts timeuuid, > data blob, > PRIMARY KEY ((ts_bucket, username), ts)) > {noformat} > If you have a single cassandra node (or cluster where RF = N) and use the ByteOrderedPartitioner, LCS becomes *ridiculously*, *obscenely*, efficient. Under a typical workload where data is inserted in order, compaction IO could be reduced to *near zero* as sstable ranges dont overlap (with a trivial change to LCS so sstables with no overlap are not rewritten when being promoted into the next level). Better yet, we don't _require_ ordered data insertion. Even if insertion order is completely random, you still get standard LCS performance characteristics which are usually acceptable (although I believe there are a few degenerate compaction cases which are not handled in the current implementation). A quick benchmark using vanilla cassandra 2.0.10 (ie no rewrite optimization) shows a *77% reduction in compaction IO* when switching from the Murmur3Partitioner to the ByteOrderedPartitioner. > The obvious problem is, of course, that using an order preserving partitioner is a Very Bad idea when N > RF. Using an OPP for time series data ordered by time is utter lunacy. > It seems to me that one solution is to split apart the roles of the partitioner so that data distribution across the cluster and data ordering on disk can be controlled independently. Ideally on disk ordering could be set per CF. Im curious about the historical choice to order data on disk by token and not key. Randomized (hashed key ordered) distribution across the cluster is obviously a good idea but natural key ordered on disk seem like it would have a number of advantages: > * Better read performance and file system page cache efficiency for any workload which access certain ranges of row keys more frequently than others (this applies to _many_ use cases beyond time series data). > * I can't think of a realistic workload where CRUD operations would be noticeably less performant when using natural instead of hash ordering. > * Better compression ratios (although probably only for skinny rows). > * Range based truncation becomes feasible. > * Ordered range scans might be feasible to implement even with random cluster distribution. > The only things I can think of which could suffer when using different cluster and disk ordering are bootstrap and repair. Although I have no evidence, the massive potential performance gains certainly still seem to be worth it. > Thoughts? This approach seems to be fundamentally different from other tickets related to improving time series data (CASSANDRA-6602, CASSANDRA-5561) which focus only on new or modified compaction strategies. By changing data sort order, existing compaction strategies can be made significantly more efficient without imposing new, restrictive, and use case specific limitations on the user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)