cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reid Pinchback <rpinchb...@tripadvisor.com>
Subject Re: Predicting Read/Write Latency as a Function of Total Requests & Cluster Size
Date Tue, 10 Dec 2019 15:58:27 GMT
Latency SLAs are very much *not* Cassandra’s sweet spot, scaling throughput and storage is
more where C*’s strengths shine.  If you want just median latency you’ll find things a
bit more amenable to modeling, but not if you have 2 nines and particularly not 3 nines SLA
expectations.  Basically, the harder you push on the nodes, the more you get sporadic but
non-ignorable timing artifacts due to garbage collection and IO stalls when the flushing of
the writes can choke out the disk reads.  Also, running in AWS, you’ll find that noisy neighbors
are a routine issue no matter what the specifics of your use.

What your actual data model is, and what your patterns of reads and writes are, the impact
of deletes and TTLs requiring tombstone cleanup, etc., all dramatically change the picture.

If you aren’t already aware of it, there is something called cassandra-stress that can help
you do some experiments. The challenge though is determining if the experiments are representative
of what your actual usage will be.  Because of the GC issues in anything implemented in a
JVM or interpreter, it’s pretty easy to fall off the cliff of relevance.  TLP wrote an article
about some of the challenges of this with cassandra-stress:

https://thelastpickle.com/blog/2017/02/08/Modeling-real-life-workloads-with-cassandra-stress.html

Note that one way to not have to care a lot about variable latency is to make use of speculative
retry.  Basically you’re trading off some of your median throughput to help achieve a latency
SLA.  The tradeoff benefit breaks down when you get to 3 nines.

I’m actually hoping to start on some modeling of what the latency surface looks like with
different assumptions in the new year, not because I expect the specific numbers to translate
to anybody else but just to show how the underyling dynamics evidence themselves in metrics
when C* nodes are under duress.

R


From: Fred Habash <fmhabash@gmail.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Tuesday, December 10, 2019 at 9:57 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Predicting Read/Write Latency as a Function of Total Requests & Cluster Size

Message from External Sender
I'm looking for an empirical way to answer these two question:

1. If I increase application work load (read/write requests) by some percentage, how is it
going to affect read/write latency. Of course, all other factors remaining constant e.g. ec2
instance class, ssd specs, number of nodes, etc.

2) How many nodes do I have to add to maintain a given read/write latency?

Are there are any methods or instruments out there that can help answer these que



----------------------------------------
Thank you

Mime
View raw message