cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-6633) Investigate Bloom Filter Improvements
Date Wed, 29 Jan 2014 12:00:14 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Benedict updated CASSANDRA-6633:
--------------------------------

    Description: 
Investigate various possible improvements to our bloom filters:

1) Dynamic resizing would be useful. There are a few ways this could be achieved: with some
modification, downsampling could be supported; partitioning the hash functions so that we
may select the number of hashes/bits dynamically, by loading/unloading a given partition;
and there are some related data structures, such as Quotient Filters that support resizing
and merging.
2) Faster loading: should be possible to mmap the bloom filter representation on disk
3) Most ambitious of all, would be to try to reduce the memory requirement of bloom filters.
Golomb Coded Sets [1|http://algo2.iti.kit.edu/singler/publications/cacheefficientbloomfilters-wea2007.pdf]
are a possibility, as are other compressed hash structures [2|http://www.it-c.dk/people/pagh/papers/bloom.pdf].




  was:
Investigate various possible improvements to our bloom filters:

1) Dynamic resizing would be useful. There are a few ways this could be achieved: with some
modification, downsampling could be supported; partitioning the hash functions so that we
may select the number of hashes/bits dynamically, by loading/unloading a given partition;
and there are some related data structures, such as Quotient Filters that support resizing
and merging.
2) Faster loading: should be possible to mmap the bloom filter representation on disk
3) Most ambitious of all, would be to try to reduce the memory requirement of bloom filters.
Golomb Coded Sets [1] are a possibility, as are other compressed hash structures.

[1|http://algo2.iti.kit.edu/singler/publications/cacheefficientbloomfilters-wea2007.pdf]
[2|http://www.it-c.dk/people/pagh/papers/bloom.pdf]




> Investigate Bloom Filter Improvements
> -------------------------------------
>
>                 Key: CASSANDRA-6633
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6633
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>
> Investigate various possible improvements to our bloom filters:
> 1) Dynamic resizing would be useful. There are a few ways this could be achieved: with
some modification, downsampling could be supported; partitioning the hash functions so that
we may select the number of hashes/bits dynamically, by loading/unloading a given partition;
and there are some related data structures, such as Quotient Filters that support resizing
and merging.
> 2) Faster loading: should be possible to mmap the bloom filter representation on disk
> 3) Most ambitious of all, would be to try to reduce the memory requirement of bloom filters.
Golomb Coded Sets [1|http://algo2.iti.kit.edu/singler/publications/cacheefficientbloomfilters-wea2007.pdf]
are a possibility, as are other compressed hash structures [2|http://www.it-c.dk/people/pagh/papers/bloom.pdf].



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message