cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-68) Bloom filters have much higher false-positive rate than expected
Date Fri, 10 Apr 2009 21:50:14 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697991#action_12697991
] 

Jonathan Ellis commented on CASSANDRA-68:
-----------------------------------------

Fine.  Here is a simple test that demonstrates the problem on the old code that you could
have written in about a minute:

public class BloomFilterTest
{
    @Test
    public void testHashes() {
        Set<Integer> hashResults = new HashSet<Integer>();
        for (ISimpleHash hash : BloomFilter.hashLibrary_) {
            hashResults.add(hash.hash("1"));
        }
        assert hashResults.size() == BloomFilter.hashLibrary_.size() : hashResults.size();
    }
}

This shows that the old hash functions only generate 5 unique values (from 11 functions).
 A little inspection will show that only 2 of those are in the first six, as I mentioned above.

If that doesn't tell you "oh yeah the false positive rate will be the roof" then you need
to read the wikipedia page on Bloom filters. :)

> Bloom filters have much higher false-positive rate than expected
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-68
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-68
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.3
>
>         Attachments: 0001-r-m-unused-code-including-entire-CountingBloomFilte.patch,
0002-replace-JenkinsHash-w-MurmurHash.-its-hash-distrib.patch, 0003-rename-BloomFilter.fill-add.patch,
0004-rewrite-bloom-filters-to-use-murmur-hash-and-combina.patch, 0004a-tests.patch, 0004b-code.patch
>
>
> Gory details: http://spyced.blogspot.com/2009/01/all-you-ever-wanted-to-know-about.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message