cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8131) Short-circuited query results from collection index query
Date Tue, 21 Oct 2014 09:57:33 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178215#comment-14178215
] 

Sylvain Lebresne commented on CASSANDRA-8131:
---------------------------------------------

bq. What 2.1 minor version do you think will have this fix available?

If you mean, in which minor version will the query return the right result (i.e. nothing),
then it will be 2.1.1. The patch on this ticket will probably only make 2.1.2, but that patch
only fix the {{ALLOW FILTERING}} validation issue.

> Short-circuited query results from collection index query
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-8131
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8131
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Debian Wheezy, Oracle JDK, Cassandra 2.1
>            Reporter: Catalin Alexandru Zamfir
>            Assignee: Benjamin Lerer
>              Labels: collections, cql3, cqlsh, query, queryparser, triaged
>             Fix For: 2.1.2
>
>         Attachments: CASSANDRA-8131.txt
>
>
> After watching Jonathan's 2014 summit video, I wanted to give collection indexes a try
as they seem to be a fit for a "search by key/values" usage pattern we have in our setup.
Doing some test queries that I expect users would do against the table, a short-circuit behavior
came up:
> Here's the whole transcript:
> {noformat}
> CREATE TABLE by_sets (id int PRIMARY KEY, datakeys set<text>, datavars set<text>);
> CREATE INDEX by_sets_datakeys ON by_sets (datakeys);
> CREATE INDEX by_sets_datavars ON by_sets (datavars);
> INSERT INTO by_sets (id, datakeys, datavars) VALUES (1, {'a'}, {'b'});
> INSERT INTO by_sets (id, datakeys, datavars) VALUES (2, {'c'}, {'d'});
> INSERT INTO by_sets (id, datakeys, datavars) VALUES (3, {'e'}, {'f'});
> INSERT INTO by_sets (id, datakeys, datavars) VALUES (4, {'a'}, {'z'});
> SELECT * FROM by_sets;
>  id | datakeys | datavars
> ----+----------+----------
>   1 |    {'a'} |    {'b'}
>   2 |    {'c'} |    {'d'}
>   4 |    {'a'} |    {'z'}
>   3 |    {'e'} |    {'f'}
> {noformat}
> We then tried this query which short-circuited:
> {noformat}
> SELECT * FROM by_sets WHERE datakeys CONTAINS 'a' AND datakeys CONTAINS 'c';
>  id | datakeys | datavars
> ----+----------+----------
>   1 |    {'a'} |    {'b'}
>   4 |    {'a'} |    {'z'}
> (2 rows)
> {noformat}
> Instead of receveing 3 rows, which match the datakeys CONTAINS 'a' AND datakeys CONTAINS
'c' we only got the first.
> Doing the same, but with CONTAINS 'c' first, ignores the second AND.
> {noformat}
> SELECT * FROM by_sets WHERE datakeys CONTAINS 'c' AND datakeys CONTAINS 'a' ;
>  id | datakeys | datavars
> ----+----------+----------
>   2 |    {'c'} |    {'d'}
> (1 rows)
> {noformat}
> Also, on a side-note, I have two indexes on both datakeys and datavars. But when trying
to run a query such as:
> {noformat}
> select * from by_sets WHERE datakeys CONTAINS 'a' AND datavars CONTAINS 'z';
> code=2200 [Invalid query] message="Cannot execute this query as it might involve data
filtering and thus may have unpredictable performance. 
> If you want to execute this query despite the performance unpredictability, use ALLOW
FILTERING"
> {noformat}
> The second column, after AND (even if I inverse the order) requires an "allow filtering"
clause yet the column is indexed an an in-memory "join" of the primary keys of these sets
on the coordinator could build up the result.
> Could anyone explain the short-circuit behavior?
> And the requirement for "allow-filtering" on a secondly indexed column?
> If they're not bugs but intended they should be documented better, at least their limitations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message