cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sankalp kohli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7085) Specialized query filters for CQL3
Date Mon, 11 Aug 2014 22:30:11 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093456#comment-14093456
] 

sankalp kohli commented on CASSANDRA-7085:
------------------------------------------

Using SliceFilter for all CQL queries is very bad for performance. For large CQL partitions,
it has to touch all sstables where bloom filter says yes. This will cause response times to
grow with amount of data or levels. Here is a simple example which shows how bad it is

CREATE TABLE test
A int,
B int,
C int
PRIMARY KEY(A,B);

INSERT INTO test(A,B,C) values(1,2,3);
nodetool flush
INSERT INTO test(A,B,C) values(1,2,4);   
select c from test where A=1 and B=2; 
This query instead of being served entirely from memtable actually touched the sstable.
We verified this through tracing and also through debug in the code.

> Specialized query filters for CQL3
> ----------------------------------
>
>                 Key: CASSANDRA-7085
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7085
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>              Labels: cql, perfomance
>             Fix For: 3.0
>
>
> The semantic of CQL makes it so that the current {{NamesQueryFilter}} and {{SliceQueryFilter}}
are not always as efficient as we could be. Namely, when a {{SELECT}} only selects a handful
of columns, we still have to query to query all the columns of the select rows to distinguish
between 'live row but with no data for the queried columns' and 'no row' (see CASSANDRA-6588
for more details).
> We can solve that however by adding new filters (name and slice) specialized for CQL.
The new name filter would be a list of row prefix + a list of CQL column names (instead of
one list of cell names). The slice filter would still take a ColumnSlice[] but would add the
list of column names we care about for each row.
> The new sstable readers that goes with those filter would use the list of column names
to filter out all the cells we don't care about, so we don't have to ship those back to the
coordinator to skip them there, yet would know to still return the row marker when necessary.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message