cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sam Tunnicliffe (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-10436) Index selection should be weighted in favour of custom expressions
Date Fri, 02 Oct 2015 12:31:26 GMT
Sam Tunnicliffe created CASSANDRA-10436:
-------------------------------------------

             Summary: Index selection should be weighted in favour of custom expressions
                 Key: CASSANDRA-10436
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10436
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Sam Tunnicliffe
            Assignee: Sam Tunnicliffe
             Fix For: 3.0.0 rc2


If a SELECT contains a custom index expression (CASSANDRA-10217), that should always be chosen
as the primary expression during query execution. Should the statement contain other expressions
which can be satsfied by a built in index, we don't currently have the ability to apply the
custom expression as a filter. What's more, the method of selecting which index to use is
fairly primitive (and cannot be overridden until CASSANDRA-10214), so we should ensure that
a custom expression, if present, is always chosen. 

Suppose we have a custom index implementation which provides prefix matching on text fields.
{code}
CREATE TABLE ks.t (k int, v1 int, v2 text, PRIMARY KEY(k));
CREATE INDEX v1_idx ON ks.t(v1);
CREATE CUSTOM INDEX v2_idx ON ks.t(v2) USING 'com.example.CustomIndex';

INSERT INTO ks.t(k, v1, v2) VALUES(0, 0, 'abc');
INSERT INTO ks.t(k, v1, v2) VALUES(1, 1, 'def');

SELECT * FROM ks.t WHERE v1=0 AND expr(v2_idx, 'd*');
{code}

In the above example the expected result would contain no rows, which would be the case if
{{v2_idx}} is selected as the primary (i.e. most selective) index during query execution.
However, if {{v1_idx}} is chosen instead, the results of its lookup will have no further filter
applied and so an incorrect result will be returned.  


Note: this has always been something of an issue for custom indexes as the expressions they
support may not be natively filterable by C*. For example, with the full text search syntax
used by Stratio & DSE Search, if the custom index isn't selected the filtering will erroneously
remove all rows as the value of the dummy column does not match the Lucene/Solr search expression
literal. It's probably a fairly minor concern as in most cases a query using a custom index
will not include other expressions (usually because custom indexes are per-row indexes, and
so can support multi-field expression syntax). Also, an index implementation can return a
very low number of estimated result count to try and ensure it is selected, custom expressions
just provide an opportunity to improve the situation.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message