cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Stupp (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CASSANDRA-9150) BloomFilter false-positive-ratio does not match false-positive-chance
Date Fri, 10 Apr 2015 11:52:12 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Stupp resolved CASSANDRA-9150.
-------------------------------------
    Resolution: Invalid

/me banging head against the wall 1000000 times
(It was a bug in my test code ... obviously)

false-positive-ratio is fine

> BloomFilter false-positive-ratio does not match false-positive-chance
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-9150
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9150
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Robert Stupp
>
> As part of CASSANDRA-8413 I wrote a test that checks bloom filter false-positive-ratio
and compares it to the target false-positive-chance.
> [The test|https://github.com/snazy/cassandra/blob/8413-bffp/test/unit/org/apache/cassandra/utils/BloomFilterTest.java#L359]
basically creates a BF using {{FilterFactory.getFilter}} with the number of elements to be
added and the FPC. It then adds that number of elements to the filter. It checks the false-positive-ratio
by calling {{IFilter.isPresent}} using not-added keys.
> It feels that the FPR increases linearly with the number of elements and exceeds the
FPC for filters with more than 1,000,000 elements. Filters with 100M elements have an FPR
of .94, which is probably bad.
> {code}
> invertedHash=false fpc=0,010000 fpr=0,000107 for    Int32Type with      10000 elements,
bitset capacity=    100032, spec=BloomSpecification(K=5, bucketsPerElement=10)
> invertedHash=false fpc=0,010000 fpr=0,000931 for    Int32Type with     100000 elements,
bitset capacity=   1000064, spec=BloomSpecification(K=5, bucketsPerElement=10)
> invertedHash=false fpc=0,010000 fpr=0,009405 for    Int32Type with    1000000 elements,
bitset capacity=  10000064, spec=BloomSpecification(K=5, bucketsPerElement=10)
> invertedHash=false fpc=0,010000 fpr=0,093752 for    Int32Type with   10000000 elements,
bitset capacity= 100000064, spec=BloomSpecification(K=5, bucketsPerElement=10)
> invertedHash=false fpc=0,010000 fpr=0,942765 for    Int32Type with  100000000 elements,
bitset capacity=1000000064, spec=BloomSpecification(K=5, bucketsPerElement=10)
> {code}
> Do I measure something wrong here?
> EDIT: it does not depend on the type or the FPC - it's the same pattern for other types
and other FPCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message