Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4089418D43 for ; Thu, 4 Feb 2016 01:29:40 +0000 (UTC) Received: (qmail 95746 invoked by uid 500); 4 Feb 2016 01:29:40 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 95710 invoked by uid 500); 4 Feb 2016 01:29:40 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 95699 invoked by uid 99); 4 Feb 2016 01:29:40 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Feb 2016 01:29:40 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id C554D2C1F57 for ; Thu, 4 Feb 2016 01:29:39 +0000 (UTC) Date: Thu, 4 Feb 2016 01:29:39 +0000 (UTC) From: "Jonathan Ellis (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-11035) Use cardinality estimation to pick better compaction candidates for STCS (SizeTieredCompactionStrategy) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15131512#comment-15131512 ] Jonathan Ellis commented on CASSANDRA-11035: -------------------------------------------- The problem here was, you end up doing quadratic work comparing each sstable to each other to find the best candidates to merge. So the question is, do we try to come up with a clever way to avoid this? Or do we go ahead and brute force it, which would require updating HyperLogLog to use off-heap registers? (The latter actually looks pretty easy, now that I check the source.) > Use cardinality estimation to pick better compaction candidates for STCS (SizeTieredCompactionStrategy) > ------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-11035 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11035 > Project: Cassandra > Issue Type: Improvement > Components: Compaction > Reporter: Wei Deng > > This was initially mentioned in this blog post http://www.datastax.com/dev/blog/improving-compaction-in-cassandra-with-cardinality-estimation but I couldn't find any existing JIRA for it. As stated by [~jbellis], "Potentially even more useful would be using cardinality estimation to pick better compaction candidates. Instead of blindly merging sstables of a similar size a la SizeTieredCompactionStrategy." The L0 STCS in LCS should benefit as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)