Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6B3B3181BD for ; Mon, 22 Jun 2015 13:08:01 +0000 (UTC) Received: (qmail 82895 invoked by uid 500); 22 Jun 2015 13:08:01 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 82863 invoked by uid 500); 22 Jun 2015 13:08:01 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 82852 invoked by uid 99); 22 Jun 2015 13:08:01 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Jun 2015 13:08:01 +0000 Date: Mon, 22 Jun 2015 13:08:01 +0000 (UTC) From: "Sylvain Lebresne (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-9631) Unnecessary required filtering for query on indexed clustering key MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-9631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595885#comment-14595885 ] Sylvain Lebresne commented on CASSANDRA-9631: --------------------------------------------- It's actually not a bug. The new index you've introduced might be used by Cassandra (whether it is or not depends on some internal metrics on your data set), and filtering will be involved if that index is used as the primary index. And so requiring {{ALLOW FILTERING}} is correct. Now, using the index on {{b}} for that query is not smart and C* should always stick to the index on {{e}} for that query as it will be faster and won't require filtering. It's just not what happens in 2.1. So we can definitively improve this. That said, a (very) quick glance at 2.2/trunk seems to suggest that this improvement is actually made there. So [~blerer], can you double check if that is the case? If so, we can close this (as duplicate of CASSANDRA-7981?). Otherwise, let's look at improving it. > Unnecessary required filtering for query on indexed clustering key > ------------------------------------------------------------------ > > Key: CASSANDRA-9631 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9631 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Cassandra 2.1.6 vanilla; 3-node local cluster; OSX Yosemite 10.10.3; Installed with CCM. > Reporter: Kevin Deldycke > Labels: CQL, query, secondaryIndex > > Let's create and populate a simple table composed of one partition key {{a}}, two clustering keys {{b}} & {{c}}, and one secondary index on a standard column {{e}}: > {code:sql} > $ cqlsh 127.0.0.1 > Connected to test21 at 127.0.0.1:9160. > [cqlsh 4.1.1 | Cassandra 2.1.6-SNAPSHOT | CQL spec 3.1.1 | Thrift protocol 19.39.0] > Use HELP for help. > cqlsh> CREATE KEYSPACE test WITH REPLICATION={'class': 'SimpleStrategy', 'replication_factor': 3}; > cqlsh> CREATE TABLE test.table1 ( > ... a int, > ... b int, > ... c int, > ... d int, > ... e int, > ... PRIMARY KEY (a, b, c) > ... ); > cqlsh> CREATE INDEX table1_e ON test.table1 (e); > cqlsh> INSERT INTO test.table1 (a, b, c, d, e) VALUES (1, 1, 1, 1, 1); > (...) > cqlsh> SELECT * FROM test.table1; > a | b | c | d | e > ---+---+---+---+--- > 1 | 1 | 1 | 1 | 1 > 1 | 1 | 2 | 2 | 2 > 1 | 1 | 3 | 3 | 3 > 1 | 2 | 1 | 1 | 3 > 1 | 3 | 1 | 1 | 1 > 2 | 4 | 1 | 1 | 1 > (6 rows) > {code} > With such a schema, I am allowed to query on the indexed column without filtering by providing the first two elements of the primary key: > {code:sql} > cqlsh> SELECT * FROM test.table1 WHERE a=1 AND b=1 AND e=3; > a | b | c | d | e > ---+---+---+---+--- > 1 | 1 | 3 | 3 | 3 > (1 rows) > {code} > Let's now introduce an index on the first clustering key: > {code:sql} > cqlsh> CREATE INDEX table1_b ON test.table1 (b); > {code} > Now, I expect the same query as above to work without filtering, but it's not: > {code:sql} > cqlsh> SELECT * FROM test.table1 WHERE a=1 AND b=1 AND e=3; > Bad Request: Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING > {code} > I think this is a bug on the way secondary indexes are accounted for when checking for unfiltered queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)