cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Romain Hardouin <>
Subject Re: Attached profiled data but need help understanding it
Date Thu, 02 Mar 2017 19:41:00 GMT
Hi Kant,
> By backporting you mean I should cherry pick CASSANDRA-11966 commit and compile from
Regarding the network utilization: you checked throughput but latency is more important for
LWT. That's why you should make sure your m4 instances (both C* and client) are using ixgbevf
I agree 1500 writes/s is not impressive but 4 vCPU is low. It depends on the workload but
my experience is that an AWS instance become to be powerful with 16 vCPUs (e.g. c3.4xlarge).
And beware of EBS (again, that's my experience YMMV).
High park/unpark is a sign of excessive context switching. If I were you I would make a LWT
benchmark with 3 x c3.4xlarge or c3.8xlarge (32 vCPUs, SSD instance store). Spawn spot instances
to save money and be sure to tune cassandra.yaml accordingly e.g. concurrent_writes.
Finally, a naive question but I must ask you... are you really sure you need LWT? Can't you
achieve your goal without it?


    Le Jeudi 2 mars 2017 10h31, Kant Kodali <> a écrit :

 Hi Romain,
Any ideas on this? I am not sure why there is so much time being spent in Park and Unpark
methods as produced by thread dump? Also, could you please look into my responses from other
email? It would greatly help.
On Tue, Feb 28, 2017 at 10:20 PM, Kant Kodali <> wrote:

Hi Romain,
I am using Cassandra version 3.0.9 and here is the generated report  (Graphical view) of
my thread dump as well!. Just send this over in case if it helps.
On Tue, Feb 28, 2017 at 7:51 PM, Kant Kodali <> wrote:

Hi Romain,
Thanks again. My response are inline.

On Tue, Feb 28, 2017 at 10:04 AM, Romain Hardouin <> wrote:

> we are currently using 3.0.9.  should we use 3.8 or 3.10
No, don't use 3.X in production unless you really need a major feature.I would advise to stick
to 3.0.X (i.e. 3.0.11 now).You can backport CASSANDRA-11966 easily but of course you have
to deploy from source as a prerequisite.

   By backporting you mean I should cherry pick CASSANDRA-11966 commit and compile from

> I haven't done any tuning yet.
So it's a good news because maybe there is room for improvement
> Can I change this on a running instance? If so, how? or does it require a downtime?
You can throttle compaction at runtime with "nodetool setcompactionthroughput". Be sure to
read all nodetool commmands, some of them are really useful for a day to day tuning/management. 
If GC is fine, then check other things -> "[...] different pool sizes for NTR, concurrent
reads and writes, compaction executors, etc. Also check if you can improve network latency
(e.g. VF or ENA on AWS)."
Regarding thread pools, some of them can be resized at runtime via JMX.
> 5000 is the target.
Right now you reached 1500. Is it per node or for the cluster?We don't know your setup so
it's hard to say it's doable. Can you provide more details? VM, physical nodes, #nodes, etc.Generally
speaking LWT should be seldom used. AFAIK you won't achieve 10,000 writes/s per node.
Maybe someone on the list already made some tuning for heavy LWT workload?

    1500 total cluster.  
    I have a 8 node cassandra cluster. Each node is AWS m4.xlarge instance (so 4 vCPU, 16GB,
1Gbit network=125MB/s)
    I have 1 node (m4.xlarge) for my application which just inserts a bunch of data and
each insert is an LWT
    I tested the network throughput of the node.  I can get up 98 MB/s.
    Now, when I start my application. I see that Cassandra nodes Receive rate/ throughput
is about 4MB/s (yes it is in Mega Bytes. I checked this by running sudo iftop -B). The Disk
I/O is also same and the Cassandra process CPU usage is about 360% (the max is 400% since
it is a 4 core machine). The application node transmission throughput is about 6MB/s. so even
with 4MB/s receive throughput at Cassandra node the CPU is almost maxed out. I am not sure
what this says about Cassandra? But, what I can tell is that Network is way underutilized
and that 8 nodes are unnecessary so we plan to bring it down to 4 nodes except each node this
time will have 8 cores. All said, I am still not sure how to scale up from 1500 writes/sec?  


View raw message