cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8131) Short-circuited query results from collection index query
Date Tue, 21 Oct 2014 10:11:34 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178225#comment-14178225
] 

Sylvain Lebresne commented on CASSANDRA-8131:
---------------------------------------------

Regarding the patch, mostly lgtm but 2 minor nits:
* The {{Iterables.getLast}} in {{needFiltering}} feels misguiding in that there is no reason
to use the last. It just happen that we're in a branch where {{stmt.restrictedColumns}} can
only have one element and we want that single element. So I'd rather use {{Iterables.getOnlyElement}}.
* In {{isRestrictedByMultipleContains}}, I'd either check that {{metadataRestrictions.get()}}
don't return {{null}} or at least assert it (I know it can't be {{null}} in the context of
this patch, but it would be easy to misuse that method in the future). Similarly, I'd add
a {{instanceof Contains}} on the result of that {{get}} so that we don't have a bug waiting
to happen if we ever allow other type of restriction on collections.

> Short-circuited query results from collection index query
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-8131
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8131
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Debian Wheezy, Oracle JDK, Cassandra 2.1
>            Reporter: Catalin Alexandru Zamfir
>            Assignee: Benjamin Lerer
>              Labels: collections, cql3, cqlsh, query, queryparser, triaged
>             Fix For: 2.1.2
>
>         Attachments: CASSANDRA-8131.txt
>
>
> After watching Jonathan's 2014 summit video, I wanted to give collection indexes a try
as they seem to be a fit for a "search by key/values" usage pattern we have in our setup.
Doing some test queries that I expect users would do against the table, a short-circuit behavior
came up:
> Here's the whole transcript:
> {noformat}
> CREATE TABLE by_sets (id int PRIMARY KEY, datakeys set<text>, datavars set<text>);
> CREATE INDEX by_sets_datakeys ON by_sets (datakeys);
> CREATE INDEX by_sets_datavars ON by_sets (datavars);
> INSERT INTO by_sets (id, datakeys, datavars) VALUES (1, {'a'}, {'b'});
> INSERT INTO by_sets (id, datakeys, datavars) VALUES (2, {'c'}, {'d'});
> INSERT INTO by_sets (id, datakeys, datavars) VALUES (3, {'e'}, {'f'});
> INSERT INTO by_sets (id, datakeys, datavars) VALUES (4, {'a'}, {'z'});
> SELECT * FROM by_sets;
>  id | datakeys | datavars
> ----+----------+----------
>   1 |    {'a'} |    {'b'}
>   2 |    {'c'} |    {'d'}
>   4 |    {'a'} |    {'z'}
>   3 |    {'e'} |    {'f'}
> {noformat}
> We then tried this query which short-circuited:
> {noformat}
> SELECT * FROM by_sets WHERE datakeys CONTAINS 'a' AND datakeys CONTAINS 'c';
>  id | datakeys | datavars
> ----+----------+----------
>   1 |    {'a'} |    {'b'}
>   4 |    {'a'} |    {'z'}
> (2 rows)
> {noformat}
> Instead of receveing 3 rows, which match the datakeys CONTAINS 'a' AND datakeys CONTAINS
'c' we only got the first.
> Doing the same, but with CONTAINS 'c' first, ignores the second AND.
> {noformat}
> SELECT * FROM by_sets WHERE datakeys CONTAINS 'c' AND datakeys CONTAINS 'a' ;
>  id | datakeys | datavars
> ----+----------+----------
>   2 |    {'c'} |    {'d'}
> (1 rows)
> {noformat}
> Also, on a side-note, I have two indexes on both datakeys and datavars. But when trying
to run a query such as:
> {noformat}
> select * from by_sets WHERE datakeys CONTAINS 'a' AND datavars CONTAINS 'z';
> code=2200 [Invalid query] message="Cannot execute this query as it might involve data
filtering and thus may have unpredictable performance. 
> If you want to execute this query despite the performance unpredictability, use ALLOW
FILTERING"
> {noformat}
> The second column, after AND (even if I inverse the order) requires an "allow filtering"
clause yet the column is indexed an an in-memory "join" of the primary keys of these sets
on the coordinator could build up the result.
> Could anyone explain the short-circuit behavior?
> And the requirement for "allow-filtering" on a secondly indexed column?
> If they're not bugs but intended they should be documented better, at least their limitations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message