cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Masson <>
Subject Re: Cassandra read throughput with little/no caching.
Date Mon, 31 Dec 2012 10:05:12 GMT

Hi Yiming,

I've had the chance to observe what happens to cassandra read response 
time over time.

It starts out with fast 1ms reads, until the first compaction starts, 
then the CPUs are maxed out for a period, and read latency rises to 4ms. 
After compaction finishes, the system returns to 1ms reads and low cpu use.

This cycle repeats a few more times, but eventually, compactions become 
more and more infrequent and read-latency is stuck at 4ms for the rest 
of the batch operation.

I understand why compaction occurs, but not why it takes so long for our 
dataset, or why it eventually seems to not return to the original 
performance levels.

Our dataset just about fits in each node's disk-cache. Doing compaction 
should be a matter of memory and CPU bandwidth, bottlenecked by disk 
writes. I see near zero disk I/O, and the SAN is capable of sustained 
100Mb/s writes easily.

I'm using a fairly stock cassandra config.

tempted to just set this to unlimited.

# Throttles compaction to the given total throughput across the entire
# system. The faster you insert data, the faster you need to compact in
# order to keep the sstable count down, but in general, setting this to
# 16 to 32 times the rate you are inserting data is more than sufficient.
# Setting this to 0 disables throttling. Note that this account for all 
# of compaction, including validation compaction.
compaction_throughput_mb_per_sec: 16

About the only thing I have changed is this:

# For workloads with more data than can fit in memory, Cassandra's
# bottleneck will be reads that need to fetch data from
# disk. "concurrent_reads" should be set to (16 * number_of_drives) in
# order to allow the operations to enqueue low enough in the stack
# that the OS and drives can reorder them.
# On the other hand, since writes are almost never IO bound, the ideal
# number of "concurrent_writes" is dependent on the number of cores in
# your system; (8 * number_of_cores) is a good rule of thumb.
concurrent_reads: 128
concurrent_writes: 32

On 28/12/12 14:02, Yiming Sun wrote:

> Is there any chance to increase the VM configuration specs?  I couldn't
> pinpoint in exactly which message you mentioned the VMs are 2GB mem and
> 2 cores, which is a bit meager.

The data-set pretty much all fits in RAM, and using 4Ghz of CPU time to 
serve about 500 key-value pairs per second is pretty poor performance 
compared to Cassandra's competitors, no? I'd rather understand why 
performance is bad, rather than throw hardware into a black hole!

>  Also is it possible to batch the writes together?

I'll ask.

thanks for persevering!

James M

View raw message