cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <arodr...@gmail.com>
Subject Re: Effect of adding/removing nodes from cassandra
Date Tue, 01 Sep 2015 12:26:15 GMT
Hi Aadil, and welcome then !

*Graph for initial node 1, 2:*

1. You can set "stream_throughput_outbound_megabits_per_sec" in
cassandra.yaml configuration for permanent change. The point here is not to
take all the bandwidth while adding a node to let transactional network to
run as normally as possible. Also look at "nodetool setstreamthroughput"
(and "getstreamthroughput") if you want to adjust during an operation
without restarting the node. This will be overridden on node restart, back
to the value  in the cassandra.yaml

This limit the outgoing traffic, meaning that this will increase for each
node you add, if using vnodes, on certain operations like repair,
removenode, ... You might want to take this into consideration. I created a
not very popular issue about this, at least I detail the issue there -->
https://issues.apache.org/jira/browse/CASSANDRA-9509. In your exemple see
that you receive 2 * 5 = 10 MB/s, then one node finishes, and you keep
receiving 5 MB/s.

Though, default is 200 Mbps (25 MB) , not sure why you are limited to 5 MB
there, unless you changed this or I miscalculated something, this use to
happen to me :D... Also maybe something relative to
https://issues.apache.org/jira/browse/CASSANDRA-9766 (btw, ou might want to
add your version of Cassandra when posting, it would be easier to point you
to known bugs or to the right options etc)

*Graph for added node:*

1. Adding a node is CPU intensive I guess it is mainly because you have to
compact all the data you accumulated quite fast on the bootstrapping node
(maybe there is more reason, other people might explain this better, I
never needed to go that deep, I imagine handling all the stream also
consume CPU)

2. You partially answered yourself, one node ended streaming. The other
half of the question, about the reason why the other node don't send more
data is explained above. Limitations are on outgoing traffic, so this looks
normal to me.

The only weird thing to me is the threshold of you streaming throughput,
unless you changed it.

Btw, I am very curious, please feel free not to answer this if you are
under a NDA or whatever. Are you working for Akamai ? What's your use case
for Cassandra ?

Hope this will help !

C*heers,

Alain


Hi,

This is my first post to this mailing list so I want to apologize in
advance if I break any rules or guidelines. I have also inserted images of
graphs and I am not sure if they will show up. Please let me know if I can
improve my post in any way.

I am investigating the effect of adding and removing nodes from a Cassandra
cluster by collecting metrics on cpu utilization, memory usage etc.
Basically I want to have a quantitative measurement of how intensive the
add/remove operation is so we know what to expect when
increasing/decreasing capacity on our production cluster.
I wrote a tool to measure this effect but I need help interpreting the data
I have collected.

Here is a sample testcase:

*Relevant Machine Information:*
Logical cores: 8
Core model: Intel Xeon(R) CPU E31270 @ 3.4GHz
RAM: 16GB
Non-Volatile storage: Hard Disk

*Test Info:*
Keyspace: Simple strategy with replication factor 2
Initial no. of nodes on cluster: 2 [198.18.71.132, 198.18.71.133]
Initial load on each node: 30GB
No. of added nodes: 1 [198.18.71.134]
Final load on each node: 18GB

*The x-axis for all graphs is in seconds

*Graph for initial node 1:*


*Graph for initial node 2:*


*Graph for added node:*


*Here are my questions:*

*Graph for initial node 1, 2:*

1. Why is the KBytes/s sent over the network stay constant at around ~5000
KB/s? Can I change it to something higher?

*Graph for added node:*

1.  Why is the cpu usage so high?
It is close to 30% for ~800 seconds and 10% for another 400 seconds. I did
not expect the cpu usage to be this high and I did not expect it to stay
high for such a long period of time. Based on my understanding the only cpu
intensive process during bootstrap is token range recalculation. However,
based on the graph the cpu usage seems to be proportional to the amount of
data that is streamed to the node at any given moment in time.

2. Why does the Kbytes/s received over the network drop from - ~10000 ->
~5000  - at around 800 seconds?
I can see that initial node 1 stops streaming data at around 800 seconds
but why does initial node 2 not bump up its outgoing rate of transfer?


Thanks for your help,
Aadil

Mime
View raw message