cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6633) Investigate Bloom Filter Improvements
Date Wed, 29 Jan 2014 12:02:09 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885278#comment-13885278
] 

Benedict commented on CASSANDRA-6633:
-------------------------------------

It's worth noting I had a quick hacky experiment with Golomb-Rice encoded sets last night,
and found the space requirements for the same FP ratio was about the same. It's possible this
can be improved upon with some work, though.



> Investigate Bloom Filter Improvements
> -------------------------------------
>
>                 Key: CASSANDRA-6633
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6633
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>
> Investigate various possible improvements to our bloom filters:
> 1) Dynamic resizing would be useful. There are a few ways this could be achieved: with
some modification, downsampling could be supported; partitioning the hash functions so that
we may select the number of hashes/bits dynamically, by loading/unloading a given partition;
and there are some related data structures, such as Quotient Filters that support resizing
and merging.
> 2) Faster loading: should be possible to mmap the bloom filter representation on disk
> 3) Most ambitious of all, would be to try to reduce the memory requirement of bloom filters.
Golomb Coded Sets [1|http://algo2.iti.kit.edu/singler/publications/cacheefficientbloomfilters-wea2007.pdf]
are a possibility, as are other compressed hash structures [2|http://www.it-c.dk/people/pagh/papers/bloom.pdf].



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message