cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] Commented: (CASSANDRA-68) Bloom filters have much higher false-positive rate than expected
Date Fri, 10 Apr 2009 12:37:15 GMT


Jonathan Ellis commented on CASSANDRA-68:

I commented getHashBuckets which is the core of the change.  Those who are not familiar with
bloom filters in general are referred to wikipedia. :)

Normal BloomFilter size is going to be the same as the old; both are based on a BitSet which
is about as efficient as you can get.

CountingBloomFilter size is going to be half the size of the old since the old uses a full
byte per bucket and this uses a half byte.  (If you reach a count of 15 your filter is way
too small to be useful anyway; there is no reason to allow a count of 255.)

> Bloom filters have much higher false-positive rate than expected
> ----------------------------------------------------------------
>                 Key: CASSANDRA-68
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>         Attachments: 0001-r-m-unused-code-including-entire-CountingBloomFilte.patch,
0002-replace-JenkinsHash-w-MurmurHash.-its-hash-distrib.patch, 0003-rename-BloomFilter.fill-add.patch,
0004-rewrite-bloom-filters-to-use-murmur-hash-and-combina.patch, 0004a-tests.patch, 0004b-code.patch
> Gory details:

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message