cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Petrov (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-12915) SASI: Index intersection with an empty range really inefficient
Date Wed, 08 Mar 2017 09:13:38 GMT


Alex Petrov commented on CASSANDRA-12915:

I've looked at the code once again and turns out that we can't rely on disjoint for determining
whether to return an empty iterator or no, since in case with union we would like to return
just the iterators that produce results (empty ones won't produce any anyways) and in case
with intersection, even though empty is overlapping with every set, we should make a distinction,
since intersection with an empty iterator is empty. I have missed this yesterday and my tests
were passing only by chance (since intersections were disjoint by themselves anyways).

I've addressed the issues [here|].

A couple of comments on motivation:
  * a bit more tests to make sure we cover more cases
  * one of the problems revealed by new tests was that the original patch was yielding a bounce
intersection iterator (which actually has min/max), but with empty range. Now we consistently
return empty iterator that doesn't have min and max set. 
  * I wanted to avoid making a distinction for the first vs the rest ranges, mostly to use
same code path
  * hopefully it became clearer when the empty iterator is going to be returned

Could you take another look at the patch and see if we have common ground here?
Thank you once again for clarifications and discussions: it's a complex problem, was hard
to discover and isn't very simple to tackle from all sides simultaneously.

> SASI: Index intersection with an empty range really inefficient
> ---------------------------------------------------------------
>                 Key: CASSANDRA-12915
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: sasi
>            Reporter: Corentin Chary
>            Assignee: Corentin Chary
>             Fix For: 3.11.x, 4.x
> It looks like and be pretty inefficient in some cases.
Let's take the following query:
> SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar';
> In this case:
> * index1 = 'foo' will match 2 items
> * index2 = 'bar' will match ~300k items
> On my setup, the query will take ~1 sec, most of the time being spent in disk.TokenTree.getTokenAt().
> if I patch RangeIntersectionIterator so that it doesn't try to do the intersection (and
effectively only use 'index1') the query will run in a few tenth of milliseconds.
> I see multiple solutions for that:
> * Add a static thresold to avoid the use of the index for the intersection when we know
it will be slow. Probably when the range size factor is very small and the range size is big.
> * CASSANDRA-10765

This message was sent by Atlassian JIRA

View raw message