[ https://issues.apache.org/jira/browse/CASSANDRA6933?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=13958264#comment13958264
]
Jonathan Ellis edited comment on CASSANDRA6933 at 4/2/14 10:44 PM:

I wrote a quick script and Benedict is right, assuming uniform distribution does help us pick
better bsearch "pivots" than just "half the remaining elements."
{code}
import bisect
import random
error_bisecting = 0
error_range = 0
for _ in xrange(10000):
universe = range(100)
sample = sorted(random.sample(universe, 10))
last_index = 0
r = None
for n in sample:
i = bisect.bisect(universe, n, last_index + 1)
guess_bisecting = (len(universe)  (last_index + 1)) / 2
guess_range = last_index + r if r else guess_bisecting
error_bisecting += abs(i  guess_bisecting)
error_range += abs(i  guess_range)
r = i  last_index
last_index = i
print error_range  error_bisecting
{code}
was (Author: jbellis):
I wrote a quick script and Benedict is right, assuming uniform distribution does help us pick
better bsearch "pivots" than just "half the remaining elements."
{code}
import bisect
import random
error_bisecting = 0
error_range = 0
for _ in xrange(10000):
universe = range(100)
sample = sorted(random.sample(universe, 10))
last_index = 0
r = None
for n in sample:
i = bisect.bisect(universe, n, last_index + 1)
guess_bisecting = (len(universe)  (last_index + 1)) / 2
guess_range = last_index + r if r else guess_bisecting
error_bisecting += abs(i  guess_bisecting)
error_range = abs(i  guess_range)
r = i  last_index
last_index = i
print error_range  error_bisecting
{code}
> Optimise Read Comparison Costs in collectTimeOrderedData
> 
>
> Key: CASSANDRA6933
> URL: https://issues.apache.org/jira/browse/CASSANDRA6933
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Benedict
> Assignee: Benedict
> Priority: Minor
> Labels: performance
> Fix For: 2.1
>
> Attachments: 6933v3.txt, 6933v4.txt
>
>
> Introduce a new SearchIterator construct, which can be obtained from a ColumnFamily,
which permits efficiently iterating a subset of the cells in ascending order. Essentially,
it saves the previously visited position and searches from there, but also tries to avoid
searching the whole remaining space if possible.

This message was sent by Atlassian JIRA
(v6.2#6252)
