cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michał Michalski (JIRA) <>
Subject [jira] [Commented] (CASSANDRA-4986) Allow finer control of ALLOW FILTERING behavior
Date Sat, 06 Dec 2014 21:05:12 GMT


Michał Michalski commented on CASSANDRA-4986:

>From what I understand:

LIMIT defines the maximum number of rows we want to return. If there are rows matching your
query, they're guaranteed to be returned (up to the LIMIT), but it may take a long time to
find them all depending on the dataset size. You will get correct result, but there's no guarantee
on the execution time.

MAX defines the maximum number of rows we want to "iterate over" (even if none of them was
matching your query). Even if there are rows matching your query, they might not be returned
if it requires C* to iterate over too many (> MAX) rows to find them. This guarantees that
the execution time of your query will not be worse than what it takes to iterate over MAX
rows, but you might get inaccurate result (assuming more useful implementation, see point
1 in description).

> Allow finer control of ALLOW FILTERING behavior
> -----------------------------------------------
>                 Key: CASSANDRA-4986
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 3.0
> CASSANDRA-4915 added {{ALLOW FILTERING}} to warn people when they do potentially inefficient
queries. However, as discussed in the former issue it would be interesting to allow controlling
that mode more precisely by allowing something like:
> {noformat}
> {noformat}
> whose behavior would be that the query would be short-circuited if it filters (i.e. read
but discard from the ResultSet) more than 500 CQL3 rows.
> There is however 2 details I'm not totally clear on:
> # what to do exactly when we reach the max filtering allowed. Do we return what we have
so far, but then we need to have a way to say in the result set that the query was short-circuited.
Or do we just throw an exception TooManyFiltered (simpler but maybe a little bit less useful).
> # what about deleted records? Should we count them as 'filtered'? Imho the logical thing
is to not count them as filtered, since after all we "filter them out" in the normal path
(i.e. even when ALLOW FILTERING is not used).

This message was sent by Atlassian JIRA

View raw message