cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Liu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-6048) CQL3 data filtering improvement
Date Tue, 17 Sep 2013 23:42:51 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770171#comment-13770171
] 

Alex Liu edited comment on CASSANDRA-6048 at 9/17/13 11:42 PM:
---------------------------------------------------------------

Add "ALLOW FILTERING JOIN" to CQL clause. By default, it doesn't use join filtering, so end
user can decide when to use join filtering by specifying "ALLOW JOIN FILTERING".
                
      was (Author: alexliu68):
    Add "ALLOW FILTERING JOIN" to CQL clause. By default, it's not using join filtering, so
end user can decide when to use join filtering by specify "ALLOW JOIN FILTERING".
                  
> CQL3 data filtering improvement
> -------------------------------
>
>                 Key: CASSANDRA-6048
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6048
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Alex Liu
>
> Existing data filtering uses the following algorithm
> {code}
>    1. find best selective predicate based on the smallest mean columns count
>    2. fetch rows for the best selective predicate predicate, then filter the data based
on other predicates left.
> {code}
> So potentially we could improve the performance by
> {code}
>    1.  joining multiple predicates then do the data filtering for other predicates.
>    2.  fine tune the best predicate selection algorithm
> {code}
> For multiple predicate join, it could improve performance if one predicate has many entries
and another predicate has a very few of entries. It means a few index CF read, join the row
keys, fetch rows then filter other predicates
> Another approach is to have index on multiple columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message