cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6633) Investigate Bloom Filter Improvements
Date Thu, 13 Mar 2014 21:41:43 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934132#comment-13934132
] 

Benedict commented on CASSANDRA-6633:
-------------------------------------

Well, (1) and (2) should actually be pretty easy, and maybe very beneficial. Probably worth
doing for 3.0, as it should permit scaling more gracefully to larger datasets.

Probably worth splitting out investigating Golomb encoding and other optimisations for Later
though.

> Investigate Bloom Filter Improvements
> -------------------------------------
>
>                 Key: CASSANDRA-6633
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6633
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>
> Investigate various possible improvements to our bloom filters:
> 1) Dynamic resizing would be useful. There are a few ways this could be achieved: with
some modification, downsampling could be supported; partitioning the hash functions so that
we may select the number of hashes/bits dynamically, by loading/unloading a given partition;
and there are some related data structures, such as Quotient Filters that support resizing
and merging.
> 2) Faster loading: should be possible to mmap the bloom filter representation on disk
> 3) Most ambitious of all, would be to try to reduce the memory requirement of bloom filters.
Golomb Coded Sets [1|http://algo2.iti.kit.edu/singler/publications/cacheefficientbloomfilters-wea2007.pdf]
are a possibility, as are other compressed hash structures [2|http://www.it-c.dk/people/pagh/papers/bloom.pdf].



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message