cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin Deldycke (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-9631) Unnecessary required filtering for query on indexed clustering key
Date Mon, 22 Jun 2015 12:34:00 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-9631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kevin Deldycke updated CASSANDRA-9631:
--------------------------------------
    Description: 
Let's create and populate a simple table composed of one partition key {{a}}, two clustering
keys {{b}} & {{c}}, and one secondary index on a standard column {{e}}:

{code:sql}
$ cqlsh 127.0.0.1
Connected to test21 at 127.0.0.1:9160.
[cqlsh 4.1.1 | Cassandra 2.1.6-SNAPSHOT | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> CREATE KEYSPACE test
   ... WITH REPLICATION = {
   ...     'class': 'SimpleStrategy',
   ...     'replication_factor': 3,
   ... };
cqlsh> CREATE TABLE test.table1 (
   ...     a int,
   ...     b int,
   ...     c int,
   ...     d int,
   ...     e int,
   ...     PRIMARY KEY (a, b, c)
   ... );
cqlsh> CREATE INDEX table1_e ON test.table1 (e);
cqlsh> INSERT INTO test.table1 (a, b, c, d, e) VALUES (1, 1, 1, 1, 1);
(...)
cqlsh> SELECT * FROM test.table1;

 a | b | c | d | e
---+---+---+---+---
 1 | 1 | 1 | 1 | 1
 1 | 1 | 2 | 2 | 2
 1 | 1 | 3 | 3 | 3
 1 | 2 | 1 | 1 | 3
 1 | 3 | 1 | 1 | 1
 2 | 4 | 1 | 1 | 1

(7 rows)
{code}

With such a schema, I am allowed to query on the indexed column without filtering by providing
the first two elements of the primary key:
{code:sql}
cqlsh> SELECT * FROM test.table1 WHERE a=1 AND b=1 AND e=3;

 a | b | c | d | e
---+---+---+---+---
 1 | 1 | 3 | 3 | 3

(1 rows)
{code}

Let's now introduce an index on the first clustering key:
{code:sql}
cqlsh> CREATE INDEX table1_b ON test.table1 (b);
{code}

Now, I expect the same query as above to work without filtering, but it's not:
{code:sql}
cqlsh> SELECT * FROM test.unindexed_table WHERE a=1 AND b=1 AND e=3;
Bad Request: Cannot execute this query as it might involve data filtering and thus may have
unpredictable performance. If you want to execute this query despite the performance unpredictability,
use ALLOW FILTERING
{code}

I think this is a bug on the way secondary indexes are accounted for when checking for unfiltered
queries.

  was:
Let's create and populate a simple table composed of one partition key {{a}}, two clustering
keys {{b}} & {{c}}, and one secondary index on a standard column {{e}}:

{code:sql}
$ cqlsh 127.0.0.1
Connected to test21 at 127.0.0.1:9160.
[cqlsh 4.1.1 | Cassandra 2.1.6-SNAPSHOT | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> CREATE KEYSPACE test
   ... WITH REPLICATION = {
   ...     'class': 'SimpleStrategy',
   ...     'replication_factor': 3,
   ...     };
cqlsh> CREATE TABLE test.table1 (
   ...     a int,
   ...     b int,
   ...     c int,
   ...     d int,
   ...     e int,
   ...     PRIMARY KEY (a, b, c)
   ... );
cqlsh> CREATE INDEX table1_e ON test.table1 (e);
cqlsh> INSERT INTO test.table1 (a, b, c, d, e) VALUES (1, 1, 1, 1, 1);
(...)
cqlsh> SELECT * FROM test.table1;

 a | b | c | d | e
---+---+---+---+---
 1 | 1 | 1 | 1 | 1
 1 | 1 | 2 | 2 | 2
 1 | 1 | 3 | 3 | 3
 1 | 2 | 1 | 1 | 3
 1 | 3 | 1 | 1 | 1
 2 | 4 | 1 | 1 | 1

(7 rows)
{code}

With such a schema, I am allowed to query on the indexed column without filtering by providing
the first two elements of the primary key:
{code:sql}
cqlsh> SELECT * FROM test.table1 WHERE a=1 AND b=1 AND e=3;

 a | b | c | d | e
---+---+---+---+---
 1 | 1 | 3 | 3 | 3

(1 rows)
{code}

Let's now introduce an index on the first clustering key:
{code:sql}
cqlsh> CREATE INDEX table1_b ON test.table1 (b);
{code}

Now, I expect the same query as above to work without filtering, but it's not:
{code:sql}
cqlsh> SELECT * FROM test.unindexed_table WHERE a=1 AND b=1 AND e=3;
Bad Request: Cannot execute this query as it might involve data filtering and thus may have
unpredictable performance. If you want to execute this query despite the performance unpredictability,
use ALLOW FILTERING
{code}

I think this is a bug on the way secondary indexes are accounted for when checking for unfiltered
queries.


> Unnecessary required filtering for query on indexed clustering key
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-9631
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9631
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Cassandra 2.1.6 vanilla; 3-node local cluster; OSX Yosemite 10.10.3;
Installed with CCM.
>            Reporter: Kevin Deldycke
>              Labels: CQL, query, secondaryIndex
>
> Let's create and populate a simple table composed of one partition key {{a}}, two clustering
keys {{b}} & {{c}}, and one secondary index on a standard column {{e}}:
> {code:sql}
> $ cqlsh 127.0.0.1
> Connected to test21 at 127.0.0.1:9160.
> [cqlsh 4.1.1 | Cassandra 2.1.6-SNAPSHOT | CQL spec 3.1.1 | Thrift protocol 19.39.0]
> Use HELP for help.
> cqlsh> CREATE KEYSPACE test
>    ... WITH REPLICATION = {
>    ...     'class': 'SimpleStrategy',
>    ...     'replication_factor': 3,
>    ... };
> cqlsh> CREATE TABLE test.table1 (
>    ...     a int,
>    ...     b int,
>    ...     c int,
>    ...     d int,
>    ...     e int,
>    ...     PRIMARY KEY (a, b, c)
>    ... );
> cqlsh> CREATE INDEX table1_e ON test.table1 (e);
> cqlsh> INSERT INTO test.table1 (a, b, c, d, e) VALUES (1, 1, 1, 1, 1);
> (...)
> cqlsh> SELECT * FROM test.table1;
>  a | b | c | d | e
> ---+---+---+---+---
>  1 | 1 | 1 | 1 | 1
>  1 | 1 | 2 | 2 | 2
>  1 | 1 | 3 | 3 | 3
>  1 | 2 | 1 | 1 | 3
>  1 | 3 | 1 | 1 | 1
>  2 | 4 | 1 | 1 | 1
> (7 rows)
> {code}
> With such a schema, I am allowed to query on the indexed column without filtering by
providing the first two elements of the primary key:
> {code:sql}
> cqlsh> SELECT * FROM test.table1 WHERE a=1 AND b=1 AND e=3;
>  a | b | c | d | e
> ---+---+---+---+---
>  1 | 1 | 3 | 3 | 3
> (1 rows)
> {code}
> Let's now introduce an index on the first clustering key:
> {code:sql}
> cqlsh> CREATE INDEX table1_b ON test.table1 (b);
> {code}
> Now, I expect the same query as above to work without filtering, but it's not:
> {code:sql}
> cqlsh> SELECT * FROM test.unindexed_table WHERE a=1 AND b=1 AND e=3;
> Bad Request: Cannot execute this query as it might involve data filtering and thus may
have unpredictable performance. If you want to execute this query despite the performance
unpredictability, use ALLOW FILTERING
> {code}
> I think this is a bug on the way secondary indexes are accounted for when checking for
unfiltered queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message