cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Petrov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-12915) SASI: Index intersection with an empty range really inefficient
Date Tue, 07 Mar 2017 07:41:33 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898877#comment-15898877
] 

Alex Petrov edited comment on CASSANDRA-12915 at 3/7/17 7:40 AM:
-----------------------------------------------------------------

Sure, what I'm saying is that behaviour is unnecessary to fix the problem that we have. SASI
Range iterators are taking the shortest range.

For example, if we have records like 

|| a || b || c ||
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 3 | 1 |

And {{b}} and {{c}} are SASI-indexed, and we want to run the query {{WHERE b = 2 AND c = 1}},
we will get 2 iterators, one with {{2}} (for the column {{b}}) and the second one with {{1,2,3}}
(for the column {{c}}). Now, SASI will take the shortest range ({{b}}) and start iterating
by "rewinding" the {{c}} iterator to the token {{2}}. If there's no item for the token, intersection
will be empty. 

Now, moving closer to the problem. If we had just imbalanced iterators (one returning 100
results and the other just 1), SASI would be able to efficiently intersect ranges and hit
the storage to retrieve just a single row. In the case with __empty__ results from one of
the iterators, the one you're fixing, we were simply using a single iterator, and had to hit
the storage a 100 times (assuming the other iterator returns 100 results). Now, with what
I propose, we will hit is 0 times. Same with your approach, although with a lot of complex
logic that I can not justify, so I have asked you to clarify. The trick was simply to "replace"
that [null check|https://github.com/ifesdjeen/cassandra/commit/78b1ff630536b0f48787ced74a66d702d13637ba#diff-25d7f486e2818c56d6b01aa952d459f3L146]
with an actual iterator, to let the intersection know there's a second part, and it's empty.
I thought it was already discussed in [this comment|https://issues.apache.org/jira/browse/CASSANDRA-12915?focusedCommentId=15728412&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15728412]
but I'm happy to re-iterate:

bq. The way RangeIterator work and how it's optimised, we're [picking|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java#L234-L237]
the "shortest" token tree and start skipping the second token tree based on the results retrieved
from the first one. So one of the indexes will be iterated anyways. The second index (since
it's larger) might  have to fetch/load roughly 10 blocks (since they might be located far
from one another on disk), but it never has to fetch all 2M items. It'll iterate only as many
items as the smallest index has. For example, two index queries would skip through (left is
which index is used, right is index of the item within the token tree):

and 

bq. Moreover, RangeIterator already has [different optimisation strategies|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java#L76-L78]
based on differences in cardinality. I'd say that current benchmark shows that the query is
slightly slower (since we have to go through around twice as much data on disk). But given
the numbers at hand  the difference that is small and it's sub-millisecond, this optimisation
seems not to pay the complexity that it brings.

That's one of the ideas behind SASI, pretty much: to be able to efficiently merge, iterate
and skip iterators.

Looks like we're mostly on the same page. I've ran (relatively) large scale tests with a variant
of the patch I've posted (160+GB of data), and it works exactly as expected.  Now, I want
to simplify the patch and make sure we do not add the code we do not need.  

So there's no need to add additional logic to hide empty ranges, it's already handled. 


was (Author: ifesdjeen):
Sure, what I'm saying is that behaviour is unnecessary to fix the problem that we have. SASI
Range iterators are taking the shortest range.

For example, if we have records like 

|| a || b || c ||
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 3 | 1 |

And {{b}} and {{c}} are SASI-indexed, and we want to run the query {{WHERE b = 2 AND c = 1}},
we will get 2 iterators, one with {{2}} (for the column {{b}}) and the second one with {{1,2,3}}
(for the column {{c}}). Now, SASI will take the shortest range ({{b}}) and start iterating
by "rewinding" the {{c}} iterator to the token {{2}}. If there's no item for the token, intersection
will be empty. 

Now, moving closer to the problem. If we had just imbalanced iterators (one returning 100
results and the other just 1), SASI would be able to efficiently intersect ranges and hit
the storage to retrieve just a single row. In the case with __empty__ results from one of
the iterators, the one you're fixing, we were simply using a single iterator, and had to hit
the storage a 100 times (assuming the other iterator returns 100 results). Now, with what
I propose, we will hit is 0 times. Same with your approach, although with a lot of complex
logic that I can not justify, so I have asked you to clarify. The trick was simply to "replace"
that [null check|https://github.com/ifesdjeen/cassandra/commit/78b1ff630536b0f48787ced74a66d702d13637ba#diff-25d7f486e2818c56d6b01aa952d459f3L146]
with an actual iterator, to let the intersection know there's a second part, and it's empty.
I thought it was already discussed in [this comment|https://issues.apache.org/jira/browse/CASSANDRA-12915?focusedCommentId=15728412&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15728412]
but I'm happy to re-iterate:

bq. The way RangeIterator work and how it's optimised, we're [picking|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java#L234-L237]
the "shortest" token tree and start skipping the second token tree based on the results retrieved
from the first one. So one of the indexes will be iterated anyways. The second index (since
it's larger) might  have to fetch/load roughly 10 blocks (since they might be located far
from one another on disk), but it never has to fetch all 2M items. It'll iterate only as many
items as the smallest index has. For example, two index queries would skip through (left is
which index is used, right is index of the item within the token tree):

and 

bq. Moreover, RangeIterator already has [different optimisation strategies|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java#L76-L78]
based on differences in cardinality. I'd say that current benchmark shows that the query is
slightly slower (since we have to go through around twice as much data on disk). But given
the numbers at hand  the difference that is small and it's sub-millisecond, this optimisation
seems not to pay the complexity that it brings.

That's one of the ideas behind SASI, pretty much: to be able to efficiently merge, iterate
and skip iterators.

Looks like we're mostly on the same page. I've ran large-scale tests with a variant of the
patch I've posted (160+GB of data), and it works exactly as expected.  Now, I want to simplify
the patch and make sure we do not add the code we do not need.  

So there's no need to add additional logic to hide empty ranges, it's already handled. 

> SASI: Index intersection with an empty range really inefficient
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-12915
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12915
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: sasi
>            Reporter: Corentin Chary
>            Assignee: Corentin Chary
>             Fix For: 3.11.x, 4.x
>
>
> It looks like RangeIntersectionIterator.java and be pretty inefficient in some cases.
Let's take the following query:
> SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar';
> In this case:
> * index1 = 'foo' will match 2 items
> * index2 = 'bar' will match ~300k items
> On my setup, the query will take ~1 sec, most of the time being spent in disk.TokenTree.getTokenAt().
> if I patch RangeIntersectionIterator so that it doesn't try to do the intersection (and
effectively only use 'index1') the query will run in a few tenth of milliseconds.
> I see multiple solutions for that:
> * Add a static thresold to avoid the use of the index for the intersection when we know
it will be slow. Probably when the range size factor is very small and the range size is big.
> * CASSANDRA-10765



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message