incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ananth Gundabattula <agundabatt...@threatmetrix.com>
Subject Re: What is the effect of reducing the thrift message sizes on GC
Date Tue, 18 Jun 2013 08:22:22 GMT
Thanks Aaron for the insight.

One quick question:

>The buffers are not pre allocated, but once they are allocated they are
>not returned. So it's only an issue if have lots of clients connecting
>and reading a lot of data.
So to understand you correctly, the buffer is allocated per client
connection and remains all the while during the JVM and is reused for each
request ? 
If that is the case, then I am presuming there is no much gain by playing
around with this config with respect to optimizing for Gcs.

>reduce bloom filters, index intervals Š.
Well we have tried all the configs as advised below (and others like key
cache sizes etc ) and hit a dead end and that is the reason for a 1.2.4
move. Thanks for all your thoughts and advice on this.


Regards,
Ananth 



On 6/18/13 5:56 PM, "aaron morton" <aaron@thelastpickle.com> wrote:

>> *thrift_framed_transport_size_in_mb & thrift_max_message_length_in_mb*
>This control the max size of a bugger allocated by thrift when processing
>requests / responses. The buffers are not pre allocated, but once they
>are allocated they are not returned. So it's only an issue if have lots
>of clients connecting and reading a lot of data.
>
>> Our system is a very short column (both in number of columns and data
>>sizes
>> ) tables but having millions/billions of rows in each column family.
>If you have over 500 million rows per node you may be running into issues
>with the bloom filters and index samples.
>
>This typically looks like the heap usage does not reduce after CMS
>compaction has completed.
>
>Ensure the bloom_file_fp_chance on the CF's is set to 0.01 for size
>tiered compaction and 0.1 for levelled compaction. If you need to change
>it  run nodetool upgradesstables
>
>Then consider increasing the index_interval in the yaml file, see the
>comments. 
>
>Note that v 1.2 moves the bloom filters off heap, so if you upgrade to
>1.2 it will probably resolve your issues.
>
>Cheers
>
>-----------------
>Aaron Morton
>Freelance Cassandra Consultant
>New Zealand
>
>@aaronmorton
>http://www.thelastpickle.com
>
>On 18/06/2013, at 7:30 PM, Ananth Gundabattula
><agundabattula@threatmetrix.com> wrote:
>
>> We are currently running on 1.1.10 and planning to migrate to a higher
>> version 1.2.4.
>> 
>> The question pertains to tweaking all the knobs to reduce GC related
>>issues
>> ( we have been fighting a lot of really bad GC issues on 1.1.10 and met
>>with little
>> success all the way using 1.1.10)
>> 
>> Taking into consideration GC tuning is a black art, I was wondering if
>>we
>> can have some good effect on the GC by tweaking the following settings:
>> 
>> *thrift_framed_transport_size_in_mb & thrift_max_message_length_in_mb*
>> *
>> *
>> Our system is a very short column (both in number of columns and data
>>sizes
>> ) tables but having millions/billions of rows in each column family.
>>The typical
>> number of columns in each column family is 4. The typical lookup
>>involves
>> specifying the row key and fetching one column most of the times. The
>> writes are also similar except for one keyspace where the number of
>>columns
>> are 50 but very small data sizes per column.
>> 
>> Assuming we can tweak the config values :
>> *
>> *
>> * > thrift_framed_transport_size_in_mb & *
>> * >  thrift_max_message_length_in_mb *
>> 
>> to lower values in the above context, I was wondering if it helps in
>>the GC
>> being invoked less if the thrift settings reflect our data model reads
>>and writes ?
>> 
>> For example: What is the impact by reducing the above config values on
>>the
>> GC to say 1 mb rather than say 15 or 16 ?
>> 
>> Thanks a lot for your inputs and thoughts.
>> 
>> 
>> Regards,
>> Ananth
>


Mime
View raw message