cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Stupp (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-9167) Improve bloom-filter false-positive-ratio
Date Fri, 10 Apr 2015 13:12:12 GMT
Robert Stupp created CASSANDRA-9167:
---------------------------------------

             Summary: Improve bloom-filter false-positive-ratio
                 Key: CASSANDRA-9167
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9167
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Robert Stupp
            Assignee: Robert Stupp
            Priority: Minor


{{org.apache.cassandra.utils.BloomCalculations}} performs some table lookups to calculate
the bloom filter specification (size, # of hashes). Using the exact maths for that computation
brings a better false-positive-ratio (the maths usually returns higher numbers for hash-counts).

TL;DR increasing the number of hash-rounds brings a nice improvement. Finally it's a trade-off
between CPU and I/O.

||false-positive-chance||elements||capacity||hash count new||false-positive-ratio new||hash
count current||false-positive-ratio current||improvement
|0.1|10000|50048|3|0.0848|3|0.0848|0
|0.1|100000|500032|3|0.09203|3|0.09203|0
|0.1|1000000|5000064|3|0.0919|3|0.0919|0
|0.1|10000000|50000064|3|0.09182|3|0.09182|0
|0.1|100000000|500000064|3|0.091874|3|0.091874|0
|0.01|10000|100032|7|0.0092|5|0.0107|0.1630434783
|0.01|100000|1000064|7|0.00818|5|0.00931|0.1381418093
|0.01|1000000|10000064|7|0.008072|5|0.009405|0.1651387512
|0.01|10000000|100000064|7|0.008174|5|0.009375|0.146929288
|0.01|100000000|1000000064|7|0.008197|5|0.009428|0.150176894
|0.001|10000|150080|10|0.0008|7|0.001|0.25
|0.001|100000|1500032|10|0.0006|7|0.00094|0.5666666667
|0.001|1000000|15000064|10|0.000717|7|0.000991|0.3821478382
|0.001|10000000|150000064|10|0.000743|7|0.000992|0.33512786
|0.001|100000000|1500000064|10|0.000741|7|0.001002|0.3522267206
|0.0001|10000|200064|13|0|10|0.0002|#DIV/0!
|0.0001|100000|2000064|13|0.00004|10|0.0001|1.5
|0.0001|1000000|20000064|13|0.000075|10|0.000091|0.2133333333
|0.0001|10000000|200000064|13|0.000069|10|0.000087|0.2608695652
|0.0001|100000000|2000000064|13|0.000068|10|0.00009|0.3235294118

If we decide to allow more hash-rounds, it could be nicely back-ported even to 2.0 without
affecting existing sstables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message