cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sam Tunnicliffe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-10436) Index selection should be weighted in favour of custom expressions
Date Fri, 04 Dec 2015 20:05:11 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sam Tunnicliffe updated CASSANDRA-10436:
----------------------------------------
    Component/s: CQL

> Index selection should be weighted in favour of custom expressions
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-10436
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10436
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL
>            Reporter: Sam Tunnicliffe
>            Assignee: Sam Tunnicliffe
>             Fix For: 3.0.0 rc2
>
>
> If a SELECT contains a custom index expression (CASSANDRA-10217), that should always
be chosen as the primary expression during query execution. Should the statement contain other
expressions which can be satsfied by a built in index, we don't currently have the ability
to apply the custom expression as a filter. What's more, the method of selecting which index
to use is fairly primitive (and cannot be overridden until CASSANDRA-10214), so we should
ensure that a custom expression, if present, is always chosen. 
> Suppose we have a custom index implementation which provides prefix matching on text
fields.
> {code}
> CREATE TABLE ks.t (k int, v1 int, v2 text, PRIMARY KEY(k));
> CREATE INDEX v1_idx ON ks.t(v1);
> CREATE CUSTOM INDEX v2_idx ON ks.t(v2) USING 'com.example.CustomIndex';
> INSERT INTO ks.t(k, v1, v2) VALUES(0, 0, 'abc');
> INSERT INTO ks.t(k, v1, v2) VALUES(1, 1, 'def');
> SELECT * FROM ks.t WHERE v1=0 AND expr(v2_idx, 'd*') ALLOW FILTERING;
> {code}
> In the above example the expected result would contain no rows, which would be the case
if {{v2_idx}} is selected as the primary (i.e. most selective) index during query execution.
However, if {{v1_idx}} is chosen instead, the results of its lookup will have no further filter
applied and so an incorrect result will be returned.  
> Note: this has always been something of an issue for custom indexes as the expressions
they support may not be natively filterable by C*. For example, with the full text search
syntax used by Stratio & DSE Search, if the custom index isn't selected the filtering
will erroneously remove all rows as the value of the dummy column does not match the Lucene/Solr
search expression literal. It's probably a fairly minor concern as in most cases a query using
a custom index will not include other expressions (usually because custom indexes are per-row
indexes, and so can support multi-field expression syntax). Also, an index implementation
can return a very low number of estimated result count to try and ensure it is selected, custom
expressions just provide an opportunity to improve the situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message