cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9843) Augment or replace partition index with adaptive range filters
Date Sat, 18 Jul 2015 04:08:04 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632257#comment-14632257
] 

Jonathan Ellis commented on CASSANDRA-9843:
-------------------------------------------

Rather than just adding an ARF per partition (the way we used to have a BF -- the difference
is that BF is not useful for scans but this would be), we may be able to adapt this further
by moving our index into the ARF.  Instead of just a bit indicating yes or no we could have
the offset for the start of each range [that we do have data for] in the leaf.

(The "adaptive" in ARF means you can tune it to index hot parts of the data range in greater
detail, without increasing the total memory used, at the cost of less detail for the cold
ranges.  We could do this in Cassandra as well, writing updated ARF to a new file.  This could
reduce the memory problems of pulling the indexes for very large partitions into memory. 
However, the paper describes very good results even without adaptation, so this is not required
for proof of concept.)

> Augment or replace partition index with adaptive range filters
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-9843
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9843
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: T Jake Luciani
>              Labels: performance
>
> Adaptive range filters are, in principle, bloom filters for range queries.  They provide
a space-efficient way to avoid scanning a partition when we can tell that we do not contain
any data for the range requested.  Like BF, they can return false positives but not false
negatives.
> The implementation is of course totally different from BF.  ARF is a tree where each
leaf of the tree is a range of data and a bit, either on or off, denoting whether we have
*some* data in that range.
> ARF are described here: http://www.vldb.org/pvldb/vol6/p1714-kossmann.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message