cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-68) Bloom filters have much higher false-positive rate than expected
Date Fri, 10 Apr 2009 12:37:15 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697799#action_12697799
] 

Jonathan Ellis commented on CASSANDRA-68:
-----------------------------------------

I commented getHashBuckets which is the core of the change.  Those who are not familiar with
bloom filters in general are referred to wikipedia. :)

Normal BloomFilter size is going to be the same as the old; both are based on a BitSet which
is about as efficient as you can get.

CountingBloomFilter size is going to be half the size of the old since the old uses a full
byte per bucket and this uses a half byte.  (If you reach a count of 15 your filter is way
too small to be useful anyway; there is no reason to allow a count of 255.)

> Bloom filters have much higher false-positive rate than expected
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-68
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-68
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>         Attachments: 0001-r-m-unused-code-including-entire-CountingBloomFilte.patch,
0002-replace-JenkinsHash-w-MurmurHash.-its-hash-distrib.patch, 0003-rename-BloomFilter.fill-add.patch,
0004-rewrite-bloom-filters-to-use-murmur-hash-and-combina.patch, 0004a-tests.patch, 0004b-code.patch
>
>
> Gory details: http://spyced.blogspot.com/2009/01/all-you-ever-wanted-to-know-about.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message