cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sam Tunnicliffe (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-13499) Avoid duplicate calls to the same custom row index
Date Mon, 15 May 2017 15:06:04 GMT


Sam Tunnicliffe commented on CASSANDRA-13499:

bq. these implementations does not allow to mix various index implementations, when you have
for a regular index for column A and custom row-based index for column B+C

No, that's not the case. It's perfectly possible to mix indexes in exactly the way you describe.
At update time, each index is consulted as to whether it is interested in the incoming update,
in {{SIM::newUpdateTransaction}}. Given the set of columns that the update contains, the index
implementation either returns an {{Indexer}} if it should process the update, or null if not.
One of the drivers for reworking the index API in CASSANDRA-9459 was to make row based indexes
less of a hack which piggy backs on column based indexes. 

It's actually more of an issue to determine the correct index for a given query, as the built
in heuristics-based approach may not be ideal for every implementation. Essentially, at query
time each registered index which supports at least one of the query's index expressions provides
an estimated result count. The naive approach to selection simply chooses whichever index
expects to return the fewest results. Clearly, this is quite simplistic (of course Index impls
are free to decide how they come up with the estimate) so there is a means to force the use
of a specific index using custom expressions. I'll refer to the Stratio implementation for
an example:

SELECT * FROM tweets WHERE expr(tweets_index, '{
   filter: [
      {type: "range", field: "time", lower: "2014/04/25", upper: "2014/05/01"},
      {type: "prefix", field: "user", value: "a"},
      {type: "geo_distance", field: "place", latitude: 40.3930, longitude: -3.7328, max_distance:
   query: {type: "phrase", field: "body", value: "big data gives organizations", slop: 1},
   sort: {field: "time", reverse: true}
}') limit 100;

With a custom expression, the first argument is the index name, and the presence of a custom
expression in the query ensures that the index is names is used. The second argument is the
implementation specific query info.

bq. unfortunately, the alternative syntax CREATE INDEX ... <table>(col1,col2....) cannot
be updated (no ALTER INDEX statement....).

As you point out, there is no support for {{ALTER INDEX}}, but this is primarily because any
modification of an index definition is probably going to require a rebuild of the existing
index. So in order for {{ALTER}} to be more useful than simply {{DROP INDEX..CREATE INDEX}},
it would need to manage the rebuild in the background so that the old index continued to be
used until the new one was ready and then perform the swap operation. This is not to say that
that couldn't be done, just that currently it isn't. My point is that this is not specific
to any particular type of index.

> Avoid duplicate calls to the same custom row index
> --------------------------------------------------
>                 Key: CASSANDRA-13499
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: vincent royer
>            Priority: Minor
>             Fix For: 3.0.14, 3.11.0, 4.x
>         Attachments: 0006-Avoid-duplicate-calls-to-the-same-custom-index.patch
>   Original Estimate: 2h
>  Remaining Estimate: 2h
> Avoid duplicate calls to the same custom row index by using a dedicated Set<Index>
rather than the collection indexes.values().

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message