cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jaydeep Chovatia <chovatia.jayd...@gmail.com>
Subject Re: High Bloom filter false ratio
Date Thu, 18 Feb 2016 21:39:14 GMT
How many partition keys exists for the table which shows this problem (or
provide nodetool cfstats for that table)?

On Thu, Feb 18, 2016 at 11:38 AM, daemeon reiydelle <daemeonr@gmail.com>
wrote:

> The bloom filter buckets the values in a small number of buckets. I have
> been surprised by how many cases I see with large cardinality where a few
> values populate a given bloom leaf, resulting in high false positives, and
> a surprising impact on latencies!
>
> Are you seeing 2:1 ranges between mean and worse case latencies (allowing
> for gc times)?
>
> Daemeon Reiydelle
> On Feb 18, 2016 8:57 AM, "Tyler Hobbs" <tyler@datastax.com> wrote:
>
>> You can try slightly lowering the bloom_filter_fp_chance on your table.
>>
>> Otherwise, it's possible that you're repeatedly querying one or two
>> partitions that always trigger a bloom filter false positive.  You could
>> try manually tracing a few queries on this table (for non-existent
>> partitions) to see if the bloom filter rejects them.
>>
>> Depending on your Cassandra version, your false positive ratio could be
>> inaccurate: https://issues.apache.org/jira/browse/CASSANDRA-8525
>>
>> There are also a couple of recent improvements to bloom filters:
>> * https://issues.apache.org/jira/browse/CASSANDRA-8413
>> * https://issues.apache.org/jira/browse/CASSANDRA-9167
>>
>>
>> On Thu, Feb 18, 2016 at 1:35 AM, Anishek Agarwal <anishek@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> We have a table with composite partition key with humungous cardinality,
>>> its a combination of (long,long). On the table we have
>>> bloom_filter_fp_chance=0.010000.
>>>
>>> On doing "nodetool cfstats" on the 5 nodes we have in the cluster we are
>>> seeing  "Bloom filter false ratio:" in the range of 0.7 -0.9.
>>>
>>> I thought over time the bloom filter would adjust to the key space
>>> cardinality, we have been running the cluster for a long time now but have
>>> added significant traffic from Jan this year, which would not lead to
>>> writes in the db but would lead to high reads to see if are any values.
>>>
>>> Are there any settings that can be changed to allow better ratio.
>>>
>>> Thanks
>>> Anishek
>>>
>>
>>
>>
>> --
>> Tyler Hobbs
>> DataStax <http://datastax.com/>
>>
>

Mime
View raw message