cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
Date Sat, 29 Jul 2017 08:18:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16106073#comment-16106073
] 

Benedict commented on CASSANDRA-9988:
-------------------------------------

Assuming your implementation is correct, here is a modified version that works
{code}
         int lb = forwards ? nextPos : lowerBound; // inclusive
         int ub = forwards ? upperBound : nextPos; // inclusive
         int bound = 0;
         int c = -1;
         while (lb + bound < ub && (c = comparator.compare(keys[lb + bound], key))
< 0)
             bound = Math.max(bound * 2, 1);
         if (c == 0) { return lb + bound; } // note, by checking == 0, we do not need to include
it in bsearch, so we do not need to increment ub
        return Arrays.binarySearch(keys, lb + (bound/2), Math.min(lb + bound, ub), key, comparator);
{code}

This implementation as a whole could be cleaned up a little, and there are a variety of ways
to bit-twiddle to get {{bound == 0}} on first iteration, but this one is at least very clear.
 A "faster" variant would be:

{code}
         while (lb + (bound / 2) < ub && (c = comparator.compare(keys[lb + (bound
/ 2)], key)) < 0)
             bound *= 2;
{code}

A faster-still variant would just offset lb by -1 at the start, so there are no recurring
costs, and restore it after the loop.  But better than this would be something like:

{code}
         int lb = forwards ? nextPos : lowerBound; // inclusive
         int ub = forwards ? upperBound : nextPos; // inclusive
         int c = -1;
         int probe_ub = lb;
         int probe_incr = 1;
         while (probe_ub < ub && (c = comparator.compare(keys[probe_ub], key))
< 0) {
             lb = probe_ub;
             probe_ub += probe_incr;
             probe_incr *= 2;
         }
         if (c == 0) { return probe_ub; }
        return Arrays.binarySearch(keys, lb, probe_ub, key, comparator);
{code}

better still
I haven't looked at the codebase in a long time, and don't have it checked out, so haven't
consulted the branch iterator version, and don't actually recall precisely what algorithm
we used.  I don't recall it being a particularly elegant approach, though.  In fact, it's
possible it only compares the first element and then resorts to binary search;  I do not recall
for sure if we carried exponential search over from the ArrayBackedColumns (or whatever it
used to be called).

I will note the exponential search as it stands here probably does not behave very well for
reverse iteration.

Also, apologies for against-style-guide variable names.  But this comment box is too poor
an editor to revise it any further.

> Introduce leaf-only iterator
> ----------------------------
>
>                 Key: CASSANDRA-9988
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Benedict
>            Assignee: Jay Zhuang
>            Priority: Minor
>              Labels: patch
>             Fix For: 4.0
>
>         Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, 9988-result.png,
9988-test-result3.png, 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, 9988-test-result.xlsx,
9988-trunk-new.txt, 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf page. In this
case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message