cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
Date Mon, 07 Aug 2017 08:39:00 GMT


Benedict commented on CASSANDRA-9988:

bq. very bad average performance

Could you clarify exactly what the graph is showing?  Is it what we agreed to test, i.e. a
full consumption of the iterator via calls to {{next(<key>)}} ?  Because I'm pretty
sure the "average performance" of binarysearch in this case is worse by a much larger margin.

We really need a range of data points to make a decision on, so we can put the whole thing
to bed for good.

We can also (if we care) tune exp. search's variance by setting the first distance it jumps.
 We currently jump to the immediately following item - if we jump 4 items first, we take between
3 and 7 comparisons, as opposed to between 1 and 9.  However, there is advantage to higher
variance, since the lower cost operations are going to be encountered more likely when we
consume multiple items from the iterator. 

bq. optimized to search the first and second value

exp. search is optimised for any _sequence_ of values.  i.e., if we want just a random 8 items
in a 32 item iterator, we should expect exp. search to win on average.  *If you would like,
you could modify your test overall to simply search for a random number of items, randomly
selected from the iterator's contents.*  This should favour exp. search, esp. as the number
of random items increases.  In real systems, selections are more likely to bias towards adjacent
items than purely random, so any such advantage will probably be understated.

bq. complicate code, hard to maintain

I did indicate my changes assumed your implementation was valid (I did not attempt to verify
it, and the original looks to be the source of the bug).  Since there was minimal effort on
both our parts to just test the hypothesis, I don't think this should be a major concern.
 This code snippet is trivial to add perfect test coverage for, so we could randomly generate
code until it worked if we wanted.

bq. Dir.DESC will have ... performance degradation 

Clearly, the algorithm as stands would not be viable for DESC, which I did indicate in my
prior comment.  I had hoped we could simply validate its behaviour and then decide if it is
worth writing one that is agnostic to direction.  

bq. I'm not familiar with the whole cassandra code base, so not sure how often we search for
the adjacent value

As previously indicated, I'm pretty sure we, for _every query_, consume iterators in a manner
that touches adjacent (or nearly) items for most of an iterator's contents.  We do this for
result row serialization and deserialization, and we do it for query column selection.  i.e.,
we do this a great deal.  That is, unless my recollection is faulty or the codebase has changed

> Introduce leaf-only iterator
> ----------------------------
>                 Key: CASSANDRA-9988
>                 URL:
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Benedict
>            Assignee: Jay Zhuang
>            Priority: Minor
>              Labels: patch
>             Fix For: 4.0
>         Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, 9988-result3.png,
9988-result.png, 9988-test-result3.png, 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png,
9988-test-result.xlsx, 9988-trunk-new.txt, 9988-trunk-new-update.txt, trunk-9988.txt
> In many cases we have small btrees, small enough to fit in a single leaf page. In this
case it _may_ be more efficient to specialise our iterator.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message