incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexis Rodríguez <arodrig...@inconcertcc.com>
Subject Re: Reduce Cassandra GC
Date Tue, 16 Apr 2013 14:00:59 GMT
Joel,

You may want to take a look to the datastax docs [1] :

*The more memory a Cassandra node has, the better read performance. More
RAM allows for larger cache sizes and reduces disk I/O for reads. More RAM
also allows memory tables (memtables) to hold more recently written data.
Larger memtables lead to a fewer number of SSTables being flushed to disk
and fewer files to scan during a read. The ideal amount of RAM depends on
the anticipated size of your hot data.*
*
*
*For dedicated hardware, a minimum of than 8GB of RAM is needed. DataStax
recommends 16GB - 32GB.*
*Java heap space should be set to a maximum of 8GB or half of your total
RAM, whichever is lower. (A greater heap size has more intense garbage
collection periods.)*
*For a virtual environment use a minimum of 4GB, such as Amazon EC2 Large
instances. For production clusters with a healthy amount of traffic, 8GB is
more common.*

If you like you can test your configuration trying different heap sizes
with your data and check the cache hit rates [2]

[1] http://www.datastax.com/docs/1.0/cluster_architecture/cluster_planning
[2] http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra


On Tue, Apr 16, 2013 at 9:18 AM, Joel Samuelsson
<samuelsson.joel@gmail.com>wrote:

> Indeed!
> What I meant was:
> If 1GB heap is too low for >40GB data, how can I know what an appropiate
> heap is for various data sizes?
>
>
> 2013/4/16 Tomàs Núnez <tomas.nunez@groupalia.com>
>
>> Hi!
>> Reading the documentation, heap size calculation is done without counting
>> data size.
>>
>> Extract from
>> http://www.datastax.com/docs/1.1/operations/tuning#heap-sizing
>>
>> System MemoryHeap Size Less than 2GB1/2 of system memory2GB to 4GB1GB Greater
>> than 4GB1/4 system memory, but not more than 8GB
>>
>>
>>
>> 2013/4/16 Joel Samuelsson <samuelsson.joel@gmail.com>
>>
>>> How do you calculate the heap / data size ratio? Is this a linear ratio?
>>>
>>> Each node has slightly more than 12 GB right now though.
>>>
>>>
>>> 2013/4/16 Viktor Jevdokimov <Viktor.Jevdokimov@adform.com>
>>>
>>>>  For a >40GB of data 1GB of heap is too low.****
>>>>
>>>> ** **
>>>>    Best regards / Pagarbiai
>>>> *Viktor Jevdokimov*
>>>> Senior Developer
>>>>
>>>> Email: Viktor.Jevdokimov@adform.com
>>>> Phone: +370 5 212 3063, Fax +370 5 261 0453
>>>> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
>>>> Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
>>>> Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>
>>>>  [image: Adform News] <http://www.adform.com>
>>>> [image: Adform awarded the Best Employer 2012]
>>>> <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>
>>>>
>>>> Disclaimer: The information contained in this message and attachments
>>>> is intended solely for the attention and use of the named addressee and may
>>>> be confidential. If you are not the intended recipient, you are reminded
>>>> that the information remains the property of the sender. You must not use,
>>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>>> received this message in error, please contact the sender immediately and
>>>> irrevocably delete this message and any copies.
>>>>
>>>>   *From:* Joel Samuelsson [mailto:samuelsson.joel@gmail.com]
>>>> *Sent:* Tuesday, April 16, 2013 10:47
>>>> *To:* user@cassandra.apache.org
>>>> *Subject:* Reduce Cassandra GC****
>>>>
>>>> ** **
>>>>
>>>> Hi,****
>>>>
>>>> ** **
>>>>
>>>> We have a small production cluster with two nodes. The load on the
>>>> nodes is very small, around 20 reads / sec and about the same for writes.
>>>> There are around 2.5 million keys in the cluster and a RF of 2.****
>>>>
>>>> ** **
>>>>
>>>> About 2.4 million of the rows are skinny (6 columns) and around 3kb in
>>>> size (each). Currently, scripts are running, accessing all of the keys in
>>>> timeorder to do some calculations.****
>>>>
>>>> ** **
>>>>
>>>> While running the scripts, the nodes go down and then come back up 6-7
>>>> minutes later. This seems to be due to GC. I get lines like this in the log:
>>>> ****
>>>>
>>>> INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line
>>>> 122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is
>>>> 1046937600****
>>>>
>>>> ** **
>>>>
>>>> However, the heap is not full. The heap usage has a jagged pattern
>>>> going from 60% up to 70% during 5 minutes and then back down to 60% the
>>>> next 5 minutes and so on. I get no "Heap is X full..." messages. Every once
>>>> in a while at one of these peaks, I get these stop-the-world GC for 6-7
>>>> minutes. Why does GC take up so much time even though the heap isn't full?
>>>> ****
>>>>
>>>> ** **
>>>>
>>>> I am aware that my access patterns make key caching very unlikely to be
>>>> high. And indeed, my average key cache hit ratio during the run of the
>>>> scripts is around 0.5%. I tried disabling key caching on the accessed
>>>> column family (UPDATE COLUMN FAMILY cf WITH caching=none;) through the
>>>> cassandra-cli but I get the same behaviour. Is the turning key cache off
>>>> effective immediately?****
>>>>
>>>> ** **
>>>>
>>>> Stop-the-world GC is fine if it happens for a few seconds but having
>>>> them for several minutes doesn't work. Any other suggestions to remove them?
>>>> ****
>>>>
>>>> ** **
>>>>
>>>> Best regards,****
>>>>
>>>> Joel Samuelsson****
>>>>
>>>
>>>
>>
>>
>> --
>> [image: Groupalia] <http://es.groupalia.com/>
>> www.groupalia.com <http://es.groupalia.com/> Tomàs Núñez IT-Sysprod Tel.
+
>> 34 93 159 31 00  Fax. + 34 93 396 18 52 Llull, 95-97, 2º planta, 08005
>> BarcelonaSkype: tomas.nunez.groupalia tomas.nunez@groupalia.com<nombre.apellido@groupalia.com>
[image:
>> Twitter] Twitter <http://twitter.com/#%21/groupaliaes>    [image:
>> Twitter] Facebook <https://www.facebook.com/GroupaliaEspana>    [image:
>> Twitter] Linkedin <http://www.linkedin.com/company/groupalia>
>>
>
>

Mime
View raw message