incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Schuller <peter.schul...@infidyne.com>
Subject Re: Follow-up post on cassandra configuration with some experiments on GC tuning
Date Sat, 28 Aug 2010 08:02:28 GMT
> Before I applied these changes the young gen and the survivor space
> were very spiky. Now they both seem very low all the time. As you see
> from my screen shot, before these changes my JVM memory would make
> large saw tooths, now all three pools young, eden, perm seem smoother.

I'm not sure what's going on on Mikio's original graph (why CMS-i
would somehow cause lower average memory usage; I think something else
is going on there).

WIth respect to your graph:

   http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/tune_that_jvm

That just looksl ike a concurrent/mark sweep has not completed at all
after the dip after around 14:00. Sooner or later it should either
complete a concurrent mark/sweep resulting on a dip, or it should fail
to complete it time causing a fallback to a full GC. Unless the CMS-i
does something *completely* different that I have completely missed,
you definitely should be expecting a sudden dip once a mark/sweep
finishes.

The minor saw-toothy behavior seen on the slope is mostly going to be
dictated by the young generation size chosen by the collector.
Possibly it chooses a smaller young generation when cms-i is enabled
(speculation).

Also, note that lack of saw-toothing is not a goal in and of itself
and may even be bad. For example, with respect to the young generation
the situation is essentially:

(1) The larger the young generation, the more significant the saw-tooth.
(2) The larger the young generation, the more efficient the GC (if the
application behaves according to the weak generational hypothesis -
google it if you want a ref) because less data is promoted to old gen
and because the overhead of stop-the-world is lessened.
(3) The larger the young generation, the longer the pause times to do
collections of the young generation.

A few consequences of those include:

* Assuming a parallel collection of the young generation, more CPU:s
mean that the optimal size of the young generation given a certain
pause time goal is higher. In other words, more CPU:s -> more saw
tooth.

* Lack of saw tooth may just indicate that the majority or of data
survives the young generation collections. This is not a good thing;
at best it's neutral because the application is simply such that it
does not generate a lot of temporary garbage (i.e., it does NOT adher
to the weak generational hypothesis). At worst it means GC will be
more expensive overall because the "per object" cost of collection the
old generation is significantly higher than the "per object" cost of
collecting the young generation. That said, if most data is truly very
transient in nature, a smaller young generation may still be "big
enough".

* The previous point highlights the trade-off between low pause times
and GC efficiency. One might force a smaller young generation in an
attempt to achieve shorter pause times with CMS, but the trade-off is
that a larger percentage of allocated data will survive into the old
generation and be collected there - more expensively.

> I am worried that the cms descriptions talk about systems with 1-2
> processor machines, being my system shows up at 16 processors after
> hyper threading.

My assumption has been that this recommendation is due to the fact
that the more processors you have, the less impact the CMS mark/sweep
phase may have on application throughput provided that an appropriate
number of threads is selected. So for example, if you have an 8 core
machine and have CMS use only a single thread for the mark/sweep
phase, the very fact that it is only using 1 out of 8 cores should
severely limit its impact. (Of course cache coherency issues
presumably negate this somewhat.)

Under such circumstances, incremental CMS does not seem worth it. On
the other, suppose you're running on a single CPU system. Disregarding
CPU cache issues, the concurrent mark/sweep phase would now
effectively halve the CPU resources available to the application. A
50% decrease is significant, and under such circumstances the
incremental mode is potentially interesting.

A trade-off is, presumably (again I don't know a lot about how
incremental mode is implemented, but I doubt they've avoided this), is
that the total time needed for the mark/sweep onces it does run is
higher, such that you retain more floating garbage that might
otherwise have been collected.

-- 
/ Peter Schuller

Mime
View raw message