Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EFC2F181A6 for ; Thu, 30 Apr 2015 19:54:10 +0000 (UTC) Received: (qmail 14811 invoked by uid 500); 30 Apr 2015 19:54:10 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 14770 invoked by uid 500); 30 Apr 2015 19:54:10 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 14758 invoked by uid 99); 30 Apr 2015 19:54:10 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Apr 2015 19:54:10 +0000 Date: Thu, 30 Apr 2015 19:54:10 +0000 (UTC) From: "Matthias Broecheler (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-6477) Global indexes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522136#comment-14522136 ] Matthias Broecheler commented on CASSANDRA-6477: ------------------------------------------------ I think the discussion around materialized views (which I would love to see in C* at some point) is distracting from what this ticket is really about: closing a hole in the indexing story for C*. In RDBMS (and pretty much all other database systems), indexes are used to efficiently retrieve a set of rows identified by their columns values in a particular order at the expense of write performance. By design, C* builds a 100% selectivity index on the primary key. In addition, one can install secondary indexes. Those secondary indexes are useful up to a certain selectivity %. Beyond that threshold, it becomes increasingly more efficient to maintain the index as a global distributed hash map rather than a local index on each node. And that's the hole in the indexing story, because those types of indexes must currently be maintained by the application. I am stating the obvious here to point out that the first problem is to provide the infrastructure to create that second class of indexes while ensuring some form of (eventual) consistency. Much like with 2i, once that is in place one can utilize the infrastructure to build other things on top - including materialized views which will need this to begin with (if the primary key of your materialized view has high selectivity). As for nomenclature, I agree that "global vs local" index is a technical distinction that has little to no meaning to the user. After all, they want to build an index to get to their data quickly. How that happens is highly secondary. Initially, it might make sense to ask the user to specify the selectivity estimate for the index (defaulting to low) and for C* to pick the best indexing approach based on that. In the future, one could utilize sampled histograms to help the user with that decision. > Global indexes > -------------- > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core > Reporter: Jonathan Ellis > Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.x > > > Local indexes are suitable for low-cardinality data, where spreading the index across the cluster is a Good Thing. However, for high-cardinality data, local indexes require querying most nodes in the cluster even if only a handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)