cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Srini (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6377) ALLOW FILTERING should allow seq scan filtering
Date Sat, 20 Jun 2015 01:04:02 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14594225#comment-14594225
] 

Srini commented on CASSANDRA-6377:
----------------------------------

Just to be clear what I'm referring to, let me give an example.

Primary Key  (Key1, key2, key3, Key4, Key5)

where Key1 is the partitioning key.

Assume, this is what we want to do:

Select * from merchant_data where Key1 = ‘abc’  and Key2 = ‘xxxx’  and Key4 = ‘yyyy’
 ALLOW FILTERING;

This is what the current version of Cassandra allows:
Select * from merchant_data where Key1 = ‘abc’  and Key2 = ‘xxxx’

The difference between both of them is that in the second query the application has to filter
with in its own logic for Key4, where as Cassandra (had it allowed) could have done under
1st query.

It would be a huge performance difference as it avoids network load/latency between Cassandra
node and the client. Reducing the use of secondary indexes and using the core strengths of
Cassandra would be extremely beneficial for Cassandra's adaptability across many use cases.

I do  see where this can be abused if the partition contains thousands of rows, but by forcing
ALLOW FILTERING clause, the burden would be on the client as they have to make a conscious
decision.


> ALLOW FILTERING should allow seq scan filtering
> -----------------------------------------------
>
>                 Key: CASSANDRA-6377
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6377
>             Project: Cassandra
>          Issue Type: Bug
>          Components: API
>            Reporter: Jonathan Ellis
>            Assignee: Sylvain Lebresne
>              Labels: cql
>             Fix For: 3.x
>
>
> CREATE TABLE emp_table2 (
>         empID int PRIMARY KEY,
>         firstname text,
>         lastname text,
>         b_mon text,
>         b_day text,
>         b_yr text,
> );
> INSERT INTO emp_table2 (empID,firstname,lastname,b_mon,b_day,b_yr) 
>    VALUES (100,'jane','doe','oct','31','1980');
> INSERT INTO emp_table2 (empID,firstname,lastname,b_mon,b_day,b_yr) 
>    VALUES (101,'john','smith','jan','01','1981');
> INSERT INTO emp_table2 (empID,firstname,lastname,b_mon,b_day,b_yr) 
>    VALUES (102,'mary','jones','apr','15','1982');
> INSERT INTO emp_table2 (empID,firstname,lastname,b_mon,b_day,b_yr) 
>    VALUES (103,'tim','best','oct','25','1982');
>    
> SELECT b_mon,b_day,b_yr,firstname,lastname FROM emp_table2 
>     WHERE b_mon='oct' ALLOW FILTERING;
> Bad Request: No indexed columns present in by-columns clause with Equal operator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message