cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kai Wang <dep...@gmail.com>
Subject Re: Setting bloom_filter_fp_chance < 0.01
Date Thu, 19 May 2016 14:55:24 GMT
with 50 bln rows and bloom_filter_fp_chance = 0.01, bloom filter will
consume a lot of off heap memory. You may want to take that into
consideration too.

On Wed, May 18, 2016 at 11:53 PM, Adarsh Kumar <adarsh0007@gmail.com> wrote:

> Hi Sai,
>
> We have a use case where we are designing a table that is going to have
> around 50 billion rows and we require a very fast reads. Partitions are not
> that complex/big, it has
> some validation data for duplicate checks (consisting 4-5 int and
> varchar). So we were trying various options to optimize read performance.
> Apart from tuning Bloom Filter we are trying following thing:
>
> 1). Better data modelling (making appropriate partition and clustering
> keys)
> 2). Trying Leveled compaction (changing data model for this one)
>
> Jonathan,
>
> I understand that tuning bloom_filter_fp_chance will not have a drastic
> performance gain.
> But this is one of the many tings we are trying.
> Please let me know if you have any other suggestions to improve read
> performance for this volume of data.
>
> Also please let me know any performance benchmark technique (currently we
> are planing to trigger massive reads from spark and check cfstats).
>
> NOTE: we will be deploying DSE on EC2, so please suggest if you have
> anything specific to DSE and EC2.
>
> Adarsh
>
> On Wed, May 18, 2016 at 9:45 PM, Jonathan Haddad <jon@jonhaddad.com>
> wrote:
>
>> The impact is it'll get massively bigger with very little performance
>> benefit, if any.
>>
>> You can't get 0 because it's a probabilistic data structure.  It tells
>> you either:
>>
>> your data is definitely not here
>> your data has a pretty decent chance of being here
>>
>> but never "it's here for sure"
>>
>> https://en.wikipedia.org/wiki/Bloom_filter
>>
>> On Wed, May 18, 2016 at 11:04 AM sai krishnam raju potturi <
>> pskraju88@gmail.com> wrote:
>>
>>> hi Adarsh;
>>>     were there any drawbacks to setting the bloom_filter_fp_chance  to
>>> the default value?
>>>
>>> thanks
>>> Sai
>>>
>>> On Wed, May 18, 2016 at 2:21 AM, Adarsh Kumar <adarsh0007@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> What is the impact of setting bloom_filter_fp_chance < 0.01.
>>>>
>>>> During performance tuning I was trying to tune bloom_filter_fp_chance
>>>> and have following questions:
>>>>
>>>> 1). Why bloom_filter_fp_chance = 0 is not allowed. (
>>>> https://issues.apache.org/jira/browse/CASSANDRA-5013)
>>>> 2). What is the maximum/recommended value of bloom_filter_fp_chance (if
>>>> we do not have any limitation for bloom filter size).
>>>>
>>>> NOTE: We are using default SizeTieredCompactionStrategy on
>>>> cassandra  2.1.8.621
>>>>
>>>> Thanks in advance..:)
>>>>
>>>> Adarsh Kumar
>>>>
>>>
>>>
>

Mime
View raw message