cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan King <r...@twitter.com>
Subject Re: High BloomFilterFalseRation
Date Tue, 02 Nov 2010 19:14:14 GMT
On Tue, Nov 2, 2010 at 1:28 AM, Daniel Doubleday
<daniel.doubleday@gmx.net> wrote:
> Hi all
>
> had some time yesterday to dig a lil deeper. And maybe this saves someone who made the
same mistake the time so ...
>
> After trying to reproduce the problem in unit tests with the same data which led nowhere
because every single result was almost exactly what the math promised and incidentally stumbling
upon this one: http://sites.google.com/site/murmurhash/murmurhash2flaw thinking omg all is
lost ... I finally found that everything is just fine.
>
> Turns out that the jmx BloomFilterFalseRation simply does not show what I expected it
to be. I thought it would provide a quality measure how good the bloom filter works in terms
of hit rate. Which would be (Unnecessary File Lookups / Total Lookups) but it is ( False Positives
/ ( False + True Positives) ) which means it does not count all hits that where rejected by
the filter.
>
> So if you would only ask for rows that do not exist this ration will always show 1.0
>
> Meaning it is rather a measure of how many of your queries ask for non existing values.

That sounds like something we should change.

-ryan

Mime
View raw message