incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: How much heap does Cassandra 1.1.11 really need ?
Date Mon, 06 May 2013 09:13:36 GMT
My general "I can haz heap space?" approach. 

* determine total row count for the node from cfstats
* determine if wide (10's of MB) rows are in use
* determine total bloom filter space for the node from cfstats
* enable full GC logging as cassandra-env.sh
* determine tenured heap low point not long after startup and after running for a while. 

Consider locking the memtable_total_space_in_mb to 2048 rather than 1/3 heap while tuning.


Consider changing JVM GC as below to check for premature tenuring (possibility with wide rows
and wide reads):
	HEAP_NEWSIZE = "1200M"
	JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=4" 
	JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=4"

^ Look at the tenuring distribution to see how many objects are making it through 4 ParNew
passes. You will want to return the settings to something closer to the defaults, maybe 1000M,
SurvivorRatio 4, MaxTenuringThreshold 2

If > 500 million rows and/or bloom filter size if > 750 MB consider:
	reduce bloom_filter_fp_chance (per cf) to 0.01 or 0.1 and nodetool upgradesstables
	increase index_interval in yaml to reduce number of samples
	watch keycache hit rate and consider increasing to 200MB

If you have a high tenured heap that is not decreasing after CMS the first place to look at
the bloom filter and index samples. If this is an CF where the value is not specified then
it's 0.000744 

Hope that helps. 
  
-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 4/05/2013, at 7:20 AM, Oleg Dulin <oleg.dulin@gmail.com> wrote:

> What constitutes an "extreme write" ?
> 
> On 2013-05-03 15:45:33
>  +0000, Edward Capriolo said:
> 
> If your writes are so extreme that metables are flushing all the time, the best you can
do is turn off all caches, do bloom filters off heap, and then instruct cassandra to use large
portions of the heap as memtables. 
> 
> 
> On Fri, May 3, 2013 at 11:40 AM, Bryan Talbot <btalbot@aeriagames.com> wrote:
> It's true that a 16GB heap is generally not a good idea; however, it's not clear from
the data provided what problem you're trying to solve.
> 
> What is it that you don't like about the default settings?
> 
> -Bryan
> 
> 
> 
> On Fri, May 3, 2013 at 4:27 AM, Oleg Dulin <oleg.dulin@gmail.com> wrote:
> Here is my question. It can't possibly be a good set up to use 16gig heap space, but
this is the best I can do. Setting it to default never worked well for me, setting it to 8g
doesn't work well either. It can't keep up with flushing memtables. It is possibly that someone
at some point may have broken something in the config files. If I were to look for hints there,
what should I look at ?
> 
> Look at my gc log from Cassandra:
> 
> Starts off like this:
> 
> 2013-04-29T08:53:44.548-0400: 5.386: [GC 1677824K->11345K(16567552K), 0.0509880 secs]
>    2 2013-04-29T08:53:47.701-0400: 8.539: [GC 1689169K->42027K(16567552K), 0.1269180
secs]
>    3 2013-04-29T08:54:05.361-0400: 26.199: [GC 1719851K->231763K(16567552K), 0.1436070
secs]
>    4 2013-04-29T08:55:44.797-0400: 125.635: [GC 1909587K->1480096K(16567552K), 1.2626270
secs]
>    5 2013-04-29T08:58:44.367-0400: 305.205: [GC 3157920K->2358588K(16567552K), 1.1198150
secs]
>    6 2013-04-29T09:01:12.167-0400: 453.005: [GC 4036412K->3634298K(16567552K), 1.0098650
secs]
>    7 2013-04-29T09:03:35.204-0400: 596.042: [GC 5312122K->4339703K(16567552K), 0.4597180
secs]
>    8 2013-04-29T09:04:51.562-0400: 672.400: [GC 6017527K->4956381K(16567552K), 0.5361800
secs]
>    9 2013-04-29T09:04:59.205-0400: 680.043: [GC 6634205K->5131825K(16567552K), 0.1741690
secs]
>   10 2013-04-29T09:05:06.638-0400: 687.476: [GC 6809649K->5027933K(16567552K), 0.0607470
secs]
>   11 2013-04-29T09:05:13.908-0400: 694.747: [GC 6705757K->5012439K(16567552K), 0.0624410
secs]
>   12 2013-04-29T09:05:20.909-0400: 701.747: [GC 6690263K->5039538K(16567552K), 0.0618750
secs]
>   13 2013-04-29T09:06:35.914-0400: 776.752: [GC 6717362K->5819204K(16567552K), 0.5738550
secs]
>   14 2013-04-29T09:08:05.589-0400: 866.428: [GC 7497028K->6678597K(16567552K), 0.6781900
secs]
>   15 2013-04-29T09:08:12.458-0400: 873.296: [GC 8356421K->6865736K(16567552K), 0.1423040
secs]
>   16 2013-04-29T09:08:18.690-0400: 879.529: [GC 8543560K->6742902K(16567552K), 0.0516470
secs]
>   17 2013-04-29T09:08:24.914-0400: 885.752: [GC 8420726K->6725877K(16567552K), 0.0517290
secs]
>   18 2013-04-29T09:08:31.008-0400: 891.846: [GC 8403701K->6741781K(16567552K), 0.0532540
secs]
>   19 2013-04-29T09:08:37.201-0400: 898.039: [GC 8419605K->6759614K(16567552K), 0.0563290
secs]
>   20 2013-04-29T09:08:43.493-0400: 904.331: [GC 8437438K->6772147K(16567552K), 0.0569580
secs]
>   21 2013-04-29T09:08:49.757-0400: 910.595: [GC 8449971K->6776883K(16567552K), 0.0558070
secs]
>   22 2013-04-29T09:08:55.973-0400: 916.812: [GC 8454707K->6789404K(16567552K), 0.0577230
secs]
> 
> ……
> 
> 
> look what it is today:
> 
> 41536 2013-05-03T07:17:13.519-0400: 339814.357: [GC 9178946K->9176740K(16567552K),
0.0265830 secs]
> 41537 2013-05-03T07:17:19.556-0400: 339820.394: [GC 10854564K->9178449K(16567552K),
0.0253180 secs]
> 41538 2013-05-03T07:17:24.390-0400: 339825.228: [GC 10856273K->9179073K(16567552K),
0.0266450 secs]
> 41539 2013-05-03T07:17:30.729-0400: 339831.567: [GC 10856897K->9178629K(16567552K),
0.0261150 secs]
> 41540 2013-05-03T07:17:35.584-0400: 339836.422: [GC 10856453K->9178586K(16567552K),
0.0250870 secs]
> 41541 2013-05-03T07:17:38.514-0400: 339839.352: [GC 10856410K->9179314K(16567552K),
0.0258120 secs]
> 41542 2013-05-03T07:17:43.200-0400: 339844.038: [GC 10857138K->9180160K(16567552K),
0.0250150 secs]
> 41543 2013-05-03T07:17:46.566-0400: 339847.404: [GC 10857984K->9179071K(16567552K),
0.0264420 secs]
> 41544 2013-05-03T07:17:52.913-0400: 339853.751: [GC 10856895K->9179870K(16567552K),
0.0262430 secs]
> 41545 2013-05-03T07:17:58.303-0400: 339859.141: [GC 10857694K->9179209K(16567552K),
0.0255130 secs]
> 41546 2013-05-03T07:18:03.427-0400: 339864.265: [GC 10857033K->9178316K(16567552K),
0.0263140 secs]
> 41547 2013-05-03T07:18:11.657-0400: 339872.495: [GC 10856140K->9178351K(16567552K),
0.0265340 secs]
> 41548 2013-05-03T07:18:17.429-0400: 339878.267: [GC 10856175K->9179067K(16567552K),
0.0254820 secs]
> 41549 2013-05-03T07:18:21.251-0400: 339882.089: [GC 10856891K->9179680K(16567552K),
0.0264210 secs]
> 41550 2013-05-03T07:18:25.062-0400: 339885.900: [GC 10857504K->9178985K(16567552K),
0.0267200 secs]
> 
> 
> 
> 
> -- 
> Regards,
> Oleg Dulin
> NYC Java Big Data Engineer
> http://www.olegdulin.com/
> 
> 
> -- 
> Regards,
> Oleg Dulin
> NYC Java Big Data Engineer
> http://www.olegdulin.com/


Mime
View raw message