cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ahamed, Aadil" <>
Subject Re: Effect of adding/removing nodes from cassandra
Date Wed, 02 Sep 2015 20:33:29 GMT
Hi Alain,

Thank you for your prompt and detailed response. I found it very helpful.
As for your questions:- I am interning at Akamai. I can’t say too much about the use case
but we are looking at Cassandra as an option to serve large volumes of sustained data.

My Cassandra version is:  2.1.7 (thanks for the tip)

As a follow up to your response:

Stream Throughput
I ran nodetool getstreamthroughput and the output was 200Mb/s = 25 MB/s so it is weird that
my throughput is capped at 5MB/s. However, I do have a guess for why this is occurring –
Initially I had a 5 node cluster but I decommissioned 2 nodes for the purpose of this test.
But if there were in fact 5 nodes then the max throughput per node would be capped at 25/5
= 5MB/s. Based on the issue you raised it does not look like Cassandra can adapt the per node
streaming throughput (5MB/s) to take full advanatage of the total stream throughput (25 MB/s).
This still raises the question: Why isn’t the per node stream throughput changed to 25/3
= 8.33MB/s after 2 nodes are decommissioned from the 5 node cluster? Does it have something
to do with  Cassandra expecting decommissioned nodes to rejoin the cluster eventually? Of
course, my guess could be wrong and the problem could be a result of a completely different

CPU usage for Bootstrapping
I have tried adding nodes with auto-compaction disabled (nodetool disableautocompaction) and
it still results in similar amounts of cpu usage so I don’t think compaction is the main
cause. In addition, you said that  handling all the stream also consume CPU. Could you explain
a little more on what you mean by “handling all the streams”?. Also, if you know any sites/documents
that have more in depth information about the bootstrap process please let me know.

Thanks again for your help. I really appreciate it.

From: Alain RODRIGUEZ <<>>
Reply-To: "<>" <<>>
Date: Tuesday, September 1, 2015 at 5:26 AM
To: "<>" <<>>
Subject: Re: Effect of adding/removing nodes from cassandra

Hi Aadil, and welcome then !

Graph for initial node 1, 2:

1. You can set "stream_throughput_outbound_megabits_per_sec" in cassandra.yaml configuration
for permanent change. The point here is not to take all the bandwidth while adding a node
to let transactional network to run as normally as possible. Also look at "nodetool setstreamthroughput"
(and "getstreamthroughput") if you want to adjust during an operation without restarting the
node. This will be overridden on node restart, back to the value  in the cassandra.yaml

This limit the outgoing traffic, meaning that this will increase for each node you add, if
using vnodes, on certain operations like repair, removenode, ... You might want to take this
into consideration. I created a not very popular issue about this, at least I detail the issue
there --><>.
In your exemple see that you receive 2 * 5 = 10 MB/s, then one node finishes, and you keep
receiving 5 MB/s.

Though, default is 200 Mbps (25 MB) , not sure why you are limited to 5 MB there, unless you
changed this or I miscalculated something, this use to happen to me :D... Also maybe something
relative to<>
(btw, ou might want to add your version of Cassandra when posting, it would be easier to point
you to known bugs or to the right options etc)

Graph for added node:

1. Adding a node is CPU intensive I guess it is mainly because you have to compact all the
data you accumulated quite fast on the bootstrapping node (maybe there is more reason, other
people might explain this better, I never needed to go that deep, I imagine handling all the
stream also consume CPU)

2. You partially answered yourself, one node ended streaming. The other half of the question,
about the reason why the other node don't send more data is explained above. Limitations are
on outgoing traffic, so this looks normal to me.

The only weird thing to me is the threshold of you streaming throughput, unless you changed

Btw, I am very curious, please feel free not to answer this if you are under a NDA or whatever.
Are you working for Akamai ? What's your use case for Cassandra ?

Hope this will help !




This is my first post to this mailing list so I want to apologize in advance if I break any
rules or guidelines. I have also inserted images of graphs and I am not sure if they will
show up. Please let me know if I can improve my post in any way.

I am investigating the effect of adding and removing nodes from a Cassandra cluster by collecting
metrics on cpu utilization, memory usage etc.
Basically I want to have a quantitative measurement of how intensive the add/remove operation
is so we know what to expect when increasing/decreasing capacity on our production cluster.
I wrote a tool to measure this effect but I need help interpreting the data I have collected.

Here is a sample testcase:

Relevant Machine Information:
Logical cores: 8
Core model: Intel Xeon(R) CPU E31270 @ 3.4GHz
Non-Volatile storage: Hard Disk

Test Info:
Keyspace: Simple strategy with replication factor 2
Initial no. of nodes on cluster: 2 [,]
Initial load on each node: 30GB
No. of added nodes: 1 []
Final load on each node: 18GB

*The x-axis for all graphs is in seconds

Graph for initial node 1:


Graph for initial node 2:


Graph for added node:

Here are my questions:

Graph for initial node 1, 2:

1. Why is the KBytes/s sent over the network stay constant at around ~5000 KB/s? Can I change
it to something higher?

Graph for added node:

1.  Why is the cpu usage so high?
It is close to 30% for ~800 seconds and 10% for another 400 seconds. I did not expect the
cpu usage to be this high and I did not expect it to stay high for such a long period of time.
Based on my understanding the only cpu intensive process during bootstrap is token range recalculation.
However, based on the graph the cpu usage seems to be proportional to the amount of data that
is streamed to the node at any given moment in time.

2. Why does the Kbytes/s received over the network drop from - ~10000 -> ~5000  - at around
800 seconds?
I can see that initial node 1 stops streaming data at around 800 seconds but why does initial
node 2 not bump up its outgoing rate of transfer?

Thanks for your help,

View raw message