cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Zhuang (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
Date Mon, 07 Aug 2017 05:01:00 GMT


Jay Zhuang commented on CASSANDRA-9988:

Here are the test results to search target on the different index with the following search
1. binary search
2. exponential search
3. binary search optimized for next adjacent value

Here are my takes from the results:
h4. 1. binary search:
  * best on average and evenly distributed;
  * Simple code, use JAVA native API.

  * not optimized to search for the adjacent value.

h4. 2. exponential search:
  * optimized to search the first and second value for {{Dir.ASC}}.

  * very bad average performance: compare to binarySearch it has only half the throughput.
{{Dir.DESC}} will have the same performance degradation and we do use DESC for some places;
  * complicate code, hard to maintain: For example, there's a bug in above code, it should
be {{while (prob_ub <= ub && (c =\[probe_ub], key)) <
0)}} it only impacts the tree size {{2^n - 1}} and search for the last element, it's tricky
to find out.

I think we could make the algorithm work for {{Dir.DESC}} too, but it will make the code even
more complicated.

h4. 3. binary search optimized for next adjacent value
  * Optimized to search next adjacent value, works for both {{Dir.ASC}} and {{Dir.DESC}}

  * Average performance is a litter bit worse than binarySearch

I'm not familiar with the whole cassandra code base, so not sure how often we search for the
adjacent value. Personally, I would still prefer binarySearch:
* consistency with other cassandra code, plus it's the current behave right now for btree
* smaller mean deviations, could be good for tp99;
* clean code.
But I'm fine with any solutions:
| 1. binary search | [9988-trunk|] |
| 2. exponential search | [9988-trunk-exp|]
| 3. binary search optimized| [9988-trunk-x|]

> Introduce leaf-only iterator
> ----------------------------
>                 Key: CASSANDRA-9988
>                 URL:
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Benedict
>            Assignee: Jay Zhuang
>            Priority: Minor
>              Labels: patch
>             Fix For: 4.0
>         Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, 9988-result3.png,
9988-result.png, 9988-test-result3.png, 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png,
9988-test-result.xlsx, 9988-trunk-new.txt, 9988-trunk-new-update.txt, trunk-9988.txt
> In many cases we have small btrees, small enough to fit in a single leaf page. In this
case it _may_ be more efficient to specialise our iterator.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message