cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sam Tunnicliffe (JIRA)" <>
Subject [jira] [Created] (CASSANDRA-10436) Index selection should be weighted in favour of custom expressions
Date Fri, 02 Oct 2015 12:31:26 GMT
Sam Tunnicliffe created CASSANDRA-10436:

             Summary: Index selection should be weighted in favour of custom expressions
                 Key: CASSANDRA-10436
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Sam Tunnicliffe
            Assignee: Sam Tunnicliffe
             Fix For: 3.0.0 rc2

If a SELECT contains a custom index expression (CASSANDRA-10217), that should always be chosen
as the primary expression during query execution. Should the statement contain other expressions
which can be satsfied by a built in index, we don't currently have the ability to apply the
custom expression as a filter. What's more, the method of selecting which index to use is
fairly primitive (and cannot be overridden until CASSANDRA-10214), so we should ensure that
a custom expression, if present, is always chosen. 

Suppose we have a custom index implementation which provides prefix matching on text fields.
CREATE TABLE ks.t (k int, v1 int, v2 text, PRIMARY KEY(k));
CREATE INDEX v1_idx ON ks.t(v1);
CREATE CUSTOM INDEX v2_idx ON ks.t(v2) USING 'com.example.CustomIndex';

INSERT INTO ks.t(k, v1, v2) VALUES(0, 0, 'abc');
INSERT INTO ks.t(k, v1, v2) VALUES(1, 1, 'def');

SELECT * FROM ks.t WHERE v1=0 AND expr(v2_idx, 'd*');

In the above example the expected result would contain no rows, which would be the case if
{{v2_idx}} is selected as the primary (i.e. most selective) index during query execution.
However, if {{v1_idx}} is chosen instead, the results of its lookup will have no further filter
applied and so an incorrect result will be returned.  

Note: this has always been something of an issue for custom indexes as the expressions they
support may not be natively filterable by C*. For example, with the full text search syntax
used by Stratio & DSE Search, if the custom index isn't selected the filtering will erroneously
remove all rows as the value of the dummy column does not match the Lucene/Solr search expression
literal. It's probably a fairly minor concern as in most cases a query using a custom index
will not include other expressions (usually because custom indexes are per-row indexes, and
so can support multi-field expression syntax). Also, an index implementation can return a
very low number of estimated result count to try and ensure it is selected, custom expressions
just provide an opportunity to improve the situation.

This message was sent by Atlassian JIRA

View raw message