cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anishek Agarwal <anis...@gmail.com>
Subject Re: High Bloom filter false ratio
Date Mon, 22 Feb 2016 04:53:54 GMT
We are using DTCS have a 30 day window for them before they are cleaned up.
I don't think with DTCS we can do anything about table sizing. Please do
let me know if there are other ideas.

On Sat, Feb 20, 2016 at 12:51 AM, Jaydeep Chovatia <
chovatia.jaydeep@gmail.com> wrote:

> To me following three looks on higher side:
> SSTable count: 1289
>
> In order to reduce SSTable count see if you are compacting of not (If
> using STCS). Is it possible to change this to LCS?
>
>
> Number of keys (estimate): 345137664 (345M partition keys)
>
> I don't have any suggestion about reducing this unless you partition your
> data.
>
>
> Bloom filter space used, bytes: 493777336 (400MB is huge)
>
> If number of keys are reduced then this will automatically reduce bloom
> filter size I believe.
>
>
>
> Jaydeep
>
> On Thu, Feb 18, 2016 at 7:52 PM, Anishek Agarwal <anishek@gmail.com>
> wrote:
>
>> Hey all,
>>
>> @Jaydeep here is the cfstats output from one node.
>>
>> Read Count: 1721134722
>>
>> Read Latency: 0.04268825050756254 ms.
>>
>> Write Count: 56743880
>>
>> Write Latency: 0.014650376727851532 ms.
>>
>> Pending Tasks: 0
>>
>> Table: user_stay_points
>>
>> SSTable count: 1289
>>
>> Space used (live), bytes: 122141272262
>>
>> Space used (total), bytes: 224227850870
>>
>> Off heap memory used (total), bytes: 653827528
>>
>> SSTable Compression Ratio: 0.4959736121441446
>>
>> Number of keys (estimate): 345137664
>>
>> Memtable cell count: 339034
>>
>> Memtable data size, bytes: 106558314
>>
>> Memtable switch count: 3266
>>
>> Local read count: 1721134803
>>
>> Local read latency: 0.048 ms
>>
>> Local write count: 56743898
>>
>> Local write latency: 0.018 ms
>>
>> Pending tasks: 0
>>
>> Bloom filter false positives: 40664437
>>
>> Bloom filter false ratio: 0.69058
>>
>> Bloom filter space used, bytes: 493777336
>>
>> Bloom filter off heap memory used, bytes: 493767024
>>
>> Index summary off heap memory used, bytes: 91677192
>>
>> Compression metadata off heap memory used, bytes: 68383312
>>
>> Compacted partition minimum bytes: 104
>>
>> Compacted partition maximum bytes: 1629722
>>
>> Compacted partition mean bytes: 1773
>>
>> Average live cells per slice (last five minutes): 0.0
>>
>> Average tombstones per slice (last five minutes): 0.0
>>
>>
>> @Tyler Hobbs
>>
>> we are using cassandra 2.0.15 so
>> https://issues.apache.org/jira/browse/CASSANDRA-8525  shouldnt occur.
>> Other problems looks like will be fixed in 3.0 .. we will mostly try and
>> slot in an upgrade to 3.x version towards second quarter of this year.
>>
>>
>> @Daemon
>>
>> Latencies seem to have higher ratios, attached is the graph.
>>
>>
>> I am mostly trying to look at Bloom filters, because the way we do reads,
>> we read data with non existent partition keys and it seems to be taking
>> long to respond, like for 720 queries it takes 2 seconds, with all 721
>> queries not returning anything. the 720 queries are done in sequence of
>> 180 queries each with 180 of them running in parallel.
>>
>>
>> thanks
>>
>> anishek
>>
>>
>>
>> On Fri, Feb 19, 2016 at 3:09 AM, Jaydeep Chovatia <
>> chovatia.jaydeep@gmail.com> wrote:
>>
>>> How many partition keys exists for the table which shows this problem
>>> (or provide nodetool cfstats for that table)?
>>>
>>> On Thu, Feb 18, 2016 at 11:38 AM, daemeon reiydelle <daemeonr@gmail.com>
>>> wrote:
>>>
>>>> The bloom filter buckets the values in a small number of buckets. I
>>>> have been surprised by how many cases I see with large cardinality where
a
>>>> few values populate a given bloom leaf, resulting in high false positives,
>>>> and a surprising impact on latencies!
>>>>
>>>> Are you seeing 2:1 ranges between mean and worse case latencies
>>>> (allowing for gc times)?
>>>>
>>>> Daemeon Reiydelle
>>>> On Feb 18, 2016 8:57 AM, "Tyler Hobbs" <tyler@datastax.com> wrote:
>>>>
>>>>> You can try slightly lowering the bloom_filter_fp_chance on your table.
>>>>>
>>>>> Otherwise, it's possible that you're repeatedly querying one or two
>>>>> partitions that always trigger a bloom filter false positive.  You could
>>>>> try manually tracing a few queries on this table (for non-existent
>>>>> partitions) to see if the bloom filter rejects them.
>>>>>
>>>>> Depending on your Cassandra version, your false positive ratio could
>>>>> be inaccurate: https://issues.apache.org/jira/browse/CASSANDRA-8525
>>>>>
>>>>> There are also a couple of recent improvements to bloom filters:
>>>>> * https://issues.apache.org/jira/browse/CASSANDRA-8413
>>>>> * https://issues.apache.org/jira/browse/CASSANDRA-9167
>>>>>
>>>>>
>>>>> On Thu, Feb 18, 2016 at 1:35 AM, Anishek Agarwal <anishek@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> We have a table with composite partition key with humungous
>>>>>> cardinality, its a combination of (long,long). On the table we have
>>>>>> bloom_filter_fp_chance=0.010000.
>>>>>>
>>>>>> On doing "nodetool cfstats" on the 5 nodes we have in the cluster
we
>>>>>> are seeing  "Bloom filter false ratio:" in the range of 0.7 -0.9.
>>>>>>
>>>>>> I thought over time the bloom filter would adjust to the key space
>>>>>> cardinality, we have been running the cluster for a long time now
but have
>>>>>> added significant traffic from Jan this year, which would not lead
to
>>>>>> writes in the db but would lead to high reads to see if are any values.
>>>>>>
>>>>>> Are there any settings that can be changed to allow better ratio.
>>>>>>
>>>>>> Thanks
>>>>>> Anishek
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Tyler Hobbs
>>>>> DataStax <http://datastax.com/>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message