incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Fales <Peter.Fa...@alcatel-lucent.com>
Subject Re: Cassandra stress test and max vs. average read/write latency.
Date Fri, 23 Dec 2011 15:00:19 GMT
Peter,

Thanks for your response. I'm looking into some of the ideas in your
other recent mail, but I had another followup question on this one...

Is there any way to control the CPU load when using the "stress" benchmark?
I have some control over that with our home-grown benchmark, but I
thought it made sense to use the official benchmark tool as people might
more readily believe those results and/or be able to reproduce them.  But
offhand, I don't see any to throttle back the load created by the 
stress test.

On Mon, Dec 19, 2011 at 09:47:32PM -0800, Peter Schuller wrote:
> > I'm trying to understand if this is expected or not, and if there is
> 
> Without careful tuning, outliers around a couple of hundred ms are
> definitely expected in general (not *necessarily*, depending on
> workload) as a result of garbage collection pauses. The impact will be
> worsened a bit if you are running under high CPU load (or even maxing
> it out with stress) because post-pause, if you are close to max CPU
> usage you will take considerably longer to "catch up".
> 
> Personally, I would just log each response time and feed it to gnuplot
> or something. It should be pretty obvious whether or not the latencies
> are due to periodic pauses.
> 
> If you are concerned with eliminating or reducing outliers, I would:
> 
> (1) Make sure that when you're benchmarking, that you're putting
> Cassandra under a reasonable amount of load. Latency benchmarks are
> usually useless if you're benchmarking against a saturated system. At
> least, start by achieving your latency goals at 25% or less CPU usage,
> and then go from there if you want to up it.
> 
> (2) One can affect GC pauses, but it's non-trivial to eliminate the
> problem completely. For example, the length of frequent young-gen
> pauses can typically be decreased by decreasing the size of the young
> generation, leading to more frequent shorter GC pauses. But that
> instead causes more promotion into the old generation, which will
> result in more frequent very long pauses (relative to normal; they
> would still be infrequent relative to young gen pauses) - IF your
> workload is such that you are suffering from fragmentation and
> eventually seeing Cassandra fall back to full compacting GC:s
> (stop-the-world) for the old generation.
> 
> I would start by adjusting young gen so that your frequent pauses are
> at an acceptable level, and then see whether or not you can sustain
> that in terms of old-gen.
> 
> Start with this in any case: Run Cassandra with -XX:+PrintGC
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
> 
> -- 
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

-- 
Peter Fales
Alcatel-Lucent
Member of Technical Staff
1960 Lucent Lane
Room: 9H-505
Naperville, IL 60566-7033
Email: Peter.Fales@alcatel-lucent.com
Phone: 630 979 8031

Mime
View raw message