cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeremiah Jordan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-4476) Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)
Date Tue, 02 Dec 2014 18:45:15 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231898#comment-14231898
] 

Jeremiah Jordan commented on CASSANDRA-4476:
--------------------------------------------

I think you need to re-visit the issue of the result ordering.  Without the full result set
being in token order you cannot page through the results from the secondary index.  Internal
and user driven paging rely on being able to start the next "page" by knowing the token the
previous page ended on.  With an implementation that does not return the results in token
order, you cannot send the "end token" of the previous result as the "start token" for the
next page, or you will skip all values for following index rows that have a token before that.
 For example:

Dataset:
{noformat}
(token(key), indexed)
(1, 6), (2, 6), (3, 5), (4, 5), (5, 5), (6, 5), (7, 6), (8, 6)
{noformat}

{noformat}
select token(key),indexed from temp where indexed > 4 limit 3;
3, 5
4, 5
5, 5
{noformat}

Then without proper token order results:

{noformat}
select token(key),indexed from temp where indexed > 4 and token(key) > 5 limit 3;
6, 5
7, 6
8, 6
{noformat}

You just skipped (1, 6) and (2, 6) and can not get them.


> Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4476
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4476
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: API, Core
>            Reporter: Sylvain Lebresne
>            Assignee: Oded Peer
>            Priority: Minor
>              Labels: cql
>             Fix For: 3.0
>
>         Attachments: 4476-2.patch, 4476-3.patch, cassandra-trunk-4476.patch
>
>
> Currently, a query that uses 2ndary indexes must have at least one EQ clause (on an indexed
column). Given that indexed CFs are local (and use LocalPartitioner that order the row by
the type of the indexed column), we should extend 2ndary indexes to allow querying indexed
columns even when no EQ clause is provided.
> As far as I can tell, the main problem to solve for this is to update KeysSearcher.highestSelectivityPredicate().
I.e. how do we estimate the selectivity of non-EQ clauses? I note however that if we can do
that estimate reasonably accurately, this might provide better performance even for index
queries that both EQ and non-EQ clauses, because some non-EQ clauses may have a much better
selectivity than EQ ones (say you index both the user country and birth date, for SELECT *
FROM users WHERE country = 'US' AND birthdate > 'Jan 2009' AND birtdate < 'July 2009',
you'd better use the birthdate index first).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message