cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias Broecheler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6477) Global indexes
Date Thu, 30 Apr 2015 19:54:10 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522136#comment-14522136
] 

Matthias Broecheler commented on CASSANDRA-6477:
------------------------------------------------

I think the discussion around materialized views (which I would love to see in C* at some
point) is distracting from what this ticket is really about: closing a hole in the indexing
story for C*.

In RDBMS (and pretty much all other database systems), indexes are used to efficiently retrieve
a set of rows identified by their columns values in a particular order at the expense of write
performance. By design, C* builds a 100% selectivity index on the primary key. In addition,
one can install secondary indexes. Those secondary indexes are useful up to a certain selectivity
%. Beyond that threshold, it becomes increasingly more efficient to maintain the index as
a global distributed hash map rather than a local index on each node. And that's the hole
in the indexing story, because those types of indexes must currently be maintained by the
application.

I am stating the obvious here to point out that the first problem is to provide the infrastructure
to create that second class of indexes while ensuring some form of (eventual) consistency.
Much like with 2i, once that is in place one can utilize the infrastructure to build other
things on top - including materialized views which will need this to begin with (if the primary
key of your materialized view has high selectivity).

As for nomenclature, I agree that "global vs local" index is a technical distinction that
has little to no meaning to the user. After all, they want to build an index to get to their
data quickly. How that happens is highly secondary. Initially, it might make sense to ask
the user to specify the selectivity estimate for the index (defaulting to low) and for C*
to pick the best indexing approach based on that. In the future, one could utilize sampled
histograms to help the user with that decision.

> Global indexes
> --------------
>
>                 Key: CASSANDRA-6477
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>            Assignee: Carl Yeksigian
>              Labels: cql
>             Fix For: 3.x
>
>
> Local indexes are suitable for low-cardinality data, where spreading the index across
the cluster is a Good Thing.  However, for high-cardinality data, local indexes require querying
most nodes in the cluster even if only a handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message