cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Petrov (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-12915) SASI: Index intersection can be very inefficient
Date Thu, 24 Nov 2016 14:14:58 GMT


Alex Petrov commented on CASSANDRA-12915:

I don't think we can just drop it, we should fix it. As far as I understand it, it should
find an expression with least sstables for the given data range (but I might be mistaken),
we should just investigate a bit deeper. 

bq. Skipping the following indexes if we already found one with less tokens than command.limits().count()

This one most likely won't work for CONTAINS queries, since we do not know how many items
will get filtered out in the end. Having that said, all iterators are lazy, so just having
them in correct order (from low to high cardinality, so that we fetched tokens for the low
cardinality ones and skipped to tokens for higher cardinality indexes) and having filtering
turned on where applicable should suffice. We can talk after the problem is solved if this
is still a problem.

bq. Ordering expressions with a score

>From my basic understanding (I haven't written SASI, only worked on some subset of it),
that should help. However, that has to be very well tested (tracing range iterators and understanding
if order changes amount of seeks), benchmarked and checked for correctness. 

bq.  Do you have a good suggestion to do that without doing the search ?

It's not available for now. But since there will be a format change in the next version, we
could add it.

> SASI: Index intersection can be very inefficient
> ------------------------------------------------
>                 Key: CASSANDRA-12915
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: sasi
>            Reporter: Corentin Chary
>             Fix For: 3.x
> It looks like and be pretty inefficient in some cases.
Let's take the following query:
> SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar';
> In this case:
> * index1 = 'foo' will match 2 items
> * index2 = 'bar' will match ~300k items
> On my setup, the query will take ~1 sec, most of the time being spent in disk.TokenTree.getTokenAt().
> if I patch RangeIntersectionIterator so that it doesn't try to do the intersection (and
effectively only use 'index1') the query will run in a few tenth of milliseconds.
> I see multiple solutions for that:
> * Add a static thresold to avoid the use of the index for the intersection when we know
it will be slow. Probably when the range size factor is very small and the range size is big.
> * CASSANDRA-10765

This message was sent by Atlassian JIRA

View raw message