cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-6933) Optimise Read Comparison Costs in collectTimeOrderedData
Date Wed, 02 Apr 2014 21:19:15 GMT


Jonathan Ellis updated CASSANDRA-6933:

    Attachment: 6933-v3.txt

I agree that in the best case this is a good optimization, I'm just not convinced that real-world
use cases are going to much resemble the best case.  In particular, in CollationController
the container will be guaranteed to only have columns the filter is looking for, so we expect
to have a lot of sequential "runs" of matches when compaction is working well.  On the other
hand, once we've found "most" matches and are looking for the last handful, there's no particular
reason to expect that these last ones will be evenly distributed across the container space.
 (Sure, they will be "on average," but the variance is high enough to make that useless as
a guideline.)

v3 removes the range heuristic and fixes incrementing i on a hit.

> Optimise Read Comparison Costs in collectTimeOrderedData
> --------------------------------------------------------
>                 Key: CASSANDRA-6933
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: Minor
>              Labels: performance
>             Fix For: 2.1
>         Attachments: 6933-v3.txt
> Introduce a new SearchIterator construct, which can be obtained from a ColumnFamily,
which permits efficiently iterating a subset of the cells in ascending order. Essentially,
it saves the previously visited position and searches from there, but also tries to avoid
searching the whole remaining space if possible.

This message was sent by Atlassian JIRA

View raw message