cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] Commented: (CASSANDRA-68) Bloom filters have much higher false-positive rate than expected
Date Mon, 13 Apr 2009 15:17:15 GMT


Jonathan Ellis commented on CASSANDRA-68:

Prashant wrote on the mailing list:

"The results are a bit counter intuitive here I would have expected it to be faster with the
same FP rate but   I am not sure why it is slower if you are just using a couple of hash functions
and using double hashing...  I am sorry I haven't looked at the test code but have you tried
it with large strings as keys ? e.g 128 byte keys , also with Longs."

I replied:

"Murmur is a higher-quality hash and takes more operations to achieve its better key distribution.
 But since the new implementation always uses two calls to Murmur no matter how many hashes
are needed it is virtually constant time.  The random strings generated are 128 bytes."

> Bloom filters have much higher false-positive rate than expected
> ----------------------------------------------------------------
>                 Key: CASSANDRA-68
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.3
>         Attachments: 0001-r-m-unused-code-including-entire-CountingBloomFilte.patch,
0002-replace-JenkinsHash-w-MurmurHash.-its-hash-distrib.patch, 0003-rename-BloomFilter.fill-add.patch,
0004-rewrite-bloom-filters-to-use-murmur-hash-and-combina.patch, 0004-v3.patch, 0004a-tests.patch,
0004b-code.patch, 0005-switch-back-to-old-hash-generation-code-to-demonstra.patch, fp_test_for_old_code.patch,
fp_test_for_old_code_v2.patch, words.gz
> Gory details:

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message