cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-6477) Global indexes
Date Wed, 15 Apr 2015 17:52:04 GMT


Sylvain Lebresne commented on CASSANDRA-6477:

Let's recall that the main problem here is how to keep the index consistent with the original
table. And that's typically a problem if say 2 clients simulatenously update the same column
to 2 different values: we need to make sure that we end up with only whatever of those update
wins in the index.

Since for global indexes we know we'll have to do a read before write, what has been suggested
here is to do that on replicas, at which point we can serialize concurrent updates locally
to make sure things end up consistent. Now, we could do that on every replica but this has
a few downsides:
# every replica will update the index and we'll do RF times too many index updates.
# once a replica has done his read and computed the update for the data table and the index
table, we want to put both of those in a batch mutation to avoid inconsistencies in case of
failures. This makes write more expensive and thus the duplication of work all that less desirable.

To avoid that duplication, one possibility is to reuse the same technique we use for counters:
have the coordinator push the update to one random replica, and have that one replica do the
read before write and push everything (data and index updates) through a batchlog mutation.

The currently linked branch doesn't do all of that yet so it'll have to be added before we
can commit this.

On top of this, I think that we'll need 2 other things that are not handled yet by the branch:
* being able to index table that have collections. Indexing collections, which is also not
yet supported, can probably be left to a follow-up ticket.
* make sure we hook the index rebuild with streaming so that when the data table is repaired
we do repair the index too.
Once those have been tackled, I think we can call it good for an initial version and let other
improvements to follow-ups.

> Global indexes
> --------------
>                 Key: CASSANDRA-6477
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>            Assignee: Carl Yeksigian
>              Labels: cql
>             Fix For: 3.0
> Local indexes are suitable for low-cardinality data, where spreading the index across
the cluster is a Good Thing.  However, for high-cardinality data, local indexes require querying
most nodes in the cluster even if only a handful of rows is returned.

This message was sent by Atlassian JIRA

View raw message