Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1655918AE6 for ; Thu, 30 Jul 2015 21:30:05 +0000 (UTC) Received: (qmail 11306 invoked by uid 500); 30 Jul 2015 21:30:04 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 11247 invoked by uid 500); 30 Jul 2015 21:30:04 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 10908 invoked by uid 99); 30 Jul 2015 21:30:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jul 2015 21:30:04 +0000 Date: Thu, 30 Jul 2015 21:30:04 +0000 (UTC) From: "Benedict (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648337#comment-14648337 ] Benedict commented on CASSANDRA-9471: ------------------------------------- [~aweisberg] said: {quote} It would be nice to see a microbenchmark and benchmark demonstrating that the use of BTree is a win. I am wondering how many additional instructions and branches are hidden inside BTree that you pay for in the common case of small Columns set. A BTree is more instructions for more cache locality, but in the common case will there be enough columns to benefit? {quote} If we want to go down that avenue, we should IMO simply look into specialising the btree iterator for the situation where we have just one leaf, and this is something I've considered. That's pretty much the only likely win. We don't really have time to do the kind of microbenchmarking you're talking about though, and it is of limited use anyway, since one of the main benefits _performancewise_ (in the common case) is improved icache\* occupancy, which cannot be accounted for in a microbenchmark. In the uncommon case this also gives us better lower bounds on behaviour, and access to a more powerfully expressive toolkit. For instance, it permits us to trivially deliver a more efficient multi-way merge (which we do very often), and that can be improved further still with less trivial enhancements. Mostly, though, this patch simply guarantees us a good lower bound on performance, with improved code clarity. Right now I would very much expect performance in the common case to be a wash, especially on any of our existing workloads. > Columns should be backed by a BTree, not an array > ------------------------------------------------- > > Key: CASSANDRA-9471 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9471 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Benedict > Assignee: Benedict > Fix For: 3.0 beta 1 > > > Follow up to 8099. > We have pretty terrible lookup performance as the number of columns grows (linear). In at least one location, this results in quadratic performance. > We don't however want this structure to be either any more expensive to build, nor to store. Some small modifications to BTree will permit it to serve here, by permitting efficient lookup by index, and calculation _of_ index for a given key. -- This message was sent by Atlassian JIRA (v6.3.4#6332)