cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <arodr...@gmail.com>
Subject Re: Effect of adding/removing nodes from cassandra
Date Mon, 07 Sep 2015 15:18:41 GMT
Hi, sorry I have been quite busy and was hopping that someone else could
answer about CPU, as told I never went that deep.

"But if there were in fact 5 nodes then the max throughput per node would
be capped at 25/5 = 5MB/s"
This is wrong imho. Stream are globally capped, independently of the total
nimber of node, at any time. I mean, output of each node is capped to 200
Mb, no matter you node number, so you can always reach this 200 Mb
theoretically, even if streaming to one node.
Yet I know that bandwidth is not often the bottleneck lately, something
else is, not sure what, you might want to "watch" this issue
https://issues.apache.org/jira/browse/CASSANDRA-9766.

About cpu, if this is not due to Compactions then I have no ideas of what
is happening exactly. Yet, you might want to use
https://github.com/aragozin/jvm-tools to find this out. Running something
like "java -jar sjk-plus-0.3.6.jar ttop -s localhost:7199 -o CPU -n 30"
would probably help.

About “handling all the streams” it was to say something. I have no idea
what happen in there but there a little work to handle streams and write
data... Yet this should not be 20 %, unless your CPU are
ridiculously inefficient / old imho.

I have no knowledge of a documentation describing the bootstrap process. I
use Datastax docs a lot (
http://docs.datastax.com/en/cassandra/2.1/cassandra/gettingStartedCassandraIntro.html)
but I don't think they go that deep, it is more a user guide than a
technical description. You might want to have a look at the code...

Please keep us posted, your work is interesting (plus linked to some recent
issues about over throttled bootstraps).

C*heers,

Alain

2015-09-02 22:33 GMT+02:00 Ahamed, Aadil <aahamed@akamai.com>:

> Hi Alain,
>
> Thank you for your prompt and detailed response. I found it very helpful.
> As for your questions:- I am interning at Akamai. I can’t say too much
> about the use case but we are looking at Cassandra as an option to serve
> large volumes of sustained data.
>
> My Cassandra version is:  *2.1.7 *(thanks for the tip)
>
> As a follow up to your response:
>
> *Stream Throughput*
> I ran nodetool getstreamthroughput and the output was 200Mb/s = 25 MB/s so
> it is weird that my throughput is capped at 5MB/s. However, I do have a
> guess for why this is occurring – Initially I had a 5 node cluster but I
> decommissioned 2 nodes for the purpose of this test. But if there were in
> fact 5 nodes then the max throughput per node would be capped at 25/5 =
> 5MB/s. Based on the issue you raised it does not look like Cassandra can
> adapt the per node streaming throughput (5MB/s) to take full advanatage of
> the total stream throughput (25 MB/s). This still raises the question: Why
> isn’t the per node stream throughput changed to 25/3 = 8.33MB/s after 2
> nodes are decommissioned from the 5 node cluster? Does it have something to
> do with  Cassandra expecting decommissioned nodes to rejoin the cluster
> eventually? Of course, my guess could be wrong and the problem could be a
> result of a completely different issue.
>
> *CPU usage for Bootstrapping*
> I have tried adding nodes with auto-compaction disabled (nodetool
> disableautocompaction) and it still results in similar amounts of cpu usage
> so I don’t think compaction is the main cause. In addition, you said that
>  handling all the stream also consume CPU. Could you explain a little more
> on what you mean by “handling all the streams”?. Also, if you know any
> sites/documents that have more in depth information about the bootstrap
> process please let me know.
>
> Thanks again for your help. I really appreciate it.
> Aadil
>
>
>
> From: Alain RODRIGUEZ <arodrime@gmail.com>
> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Date: Tuesday, September 1, 2015 at 5:26 AM
> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Subject: Re: Effect of adding/removing nodes from cassandra
>
> Hi Aadil, and welcome then !
>
> *Graph for initial node 1, 2:*
>
> 1. You can set "stream_throughput_outbound_megabits_per_sec" in
> cassandra.yaml configuration for permanent change. The point here is not to
> take all the bandwidth while adding a node to let transactional network to
> run as normally as possible. Also look at "nodetool setstreamthroughput"
> (and "getstreamthroughput") if you want to adjust during an operation
> without restarting the node. This will be overridden on node restart, back
> to the value  in the cassandra.yaml
>
> This limit the outgoing traffic, meaning that this will increase for each
> node you add, if using vnodes, on certain operations like repair,
> removenode, ... You might want to take this into consideration. I created a
> not very popular issue about this, at least I detail the issue there -->
> https://issues.apache.org/jira/browse/CASSANDRA-9509
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CASSANDRA-2D9509&d=BQMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=IPrliSk1Ijg-U4peH4IIW6Lcj4i6GMuxpxu3zDH-l0w&m=_ELpGYm1Ig2bI4sGvyp4xAIzkPmj_nSWRjbGFs6HsRo&s=rRMLh3n6Lvl-M44A8_RDyeBfOvRthK06Nhdo82zWdFk&e=>.
> In your exemple see that you receive 2 * 5 = 10 MB/s, then one node
> finishes, and you keep receiving 5 MB/s.
>
> Though, default is 200 Mbps (25 MB) , not sure why you are limited to 5 MB
> there, unless you changed this or I miscalculated something, this use to
> happen to me :D... Also maybe something relative to
> https://issues.apache.org/jira/browse/CASSANDRA-9766
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CASSANDRA-2D9766&d=BQMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=IPrliSk1Ijg-U4peH4IIW6Lcj4i6GMuxpxu3zDH-l0w&m=_ELpGYm1Ig2bI4sGvyp4xAIzkPmj_nSWRjbGFs6HsRo&s=ETZFjucAoPhvGCfFFjgK2gbS2RFf-tVB16gsyHfUec4&e=>
> (btw, ou might want to add your version of Cassandra when posting, it would
> be easier to point you to known bugs or to the right options etc)
>
> *Graph for added node:*
>
> 1. Adding a node is CPU intensive I guess it is mainly because you have to
> compact all the data you accumulated quite fast on the bootstrapping node
> (maybe there is more reason, other people might explain this better, I
> never needed to go that deep, I imagine handling all the stream also
> consume CPU)
>
> 2. You partially answered yourself, one node ended streaming. The other
> half of the question, about the reason why the other node don't send more
> data is explained above. Limitations are on outgoing traffic, so this looks
> normal to me.
>
> The only weird thing to me is the threshold of you streaming throughput,
> unless you changed it.
>
> Btw, I am very curious, please feel free not to answer this if you are
> under a NDA or whatever. Are you working for Akamai ? What's your use case
> for Cassandra ?
>
> Hope this will help !
>
> C*heers,
>
> Alain
>
>
> Hi,
>
> This is my first post to this mailing list so I want to apologize in
> advance if I break any rules or guidelines. I have also inserted images of
> graphs and I am not sure if they will show up. Please let me know if I can
> improve my post in any way.
>
> I am investigating the effect of adding and removing nodes from a
> Cassandra cluster by collecting metrics on cpu utilization, memory usage
> etc.
> Basically I want to have a quantitative measurement of how intensive the
> add/remove operation is so we know what to expect when
> increasing/decreasing capacity on our production cluster.
> I wrote a tool to measure this effect but I need help interpreting the
> data I have collected.
>
> Here is a sample testcase:
>
> *Relevant Machine Information:*
> Logical cores: 8
> Core model: Intel Xeon(R) CPU E31270 @ 3.4GHz
> RAM: 16GB
> Non-Volatile storage: Hard Disk
>
> *Test Info:*
> Keyspace: Simple strategy with replication factor 2
> Initial no. of nodes on cluster: 2 [198.18.71.132, 198.18.71.133]
> Initial load on each node: 30GB
> No. of added nodes: 1 [198.18.71.134]
> Final load on each node: 18GB
>
> *The x-axis for all graphs is in seconds
>
> *Graph for initial node 1:*
>
>
> *Graph for initial node 2:*
>
>
> *Graph for added node:*
>
>
> *Here are my questions:*
>
> *Graph for initial node 1, 2:*
>
> 1. Why is the KBytes/s sent over the network stay constant at around ~5000
> KB/s? Can I change it to something higher?
>
> *Graph for added node:*
>
> 1.  Why is the cpu usage so high?
> It is close to 30% for ~800 seconds and 10% for another 400 seconds. I did
> not expect the cpu usage to be this high and I did not expect it to stay
> high for such a long period of time. Based on my understanding the only cpu
> intensive process during bootstrap is token range recalculation. However,
> based on the graph the cpu usage seems to be proportional to the amount of
> data that is streamed to the node at any given moment in time.
>
> 2. Why does the Kbytes/s received over the network drop from - ~10000 ->
> ~5000  - at around 800 seconds?
> I can see that initial node 1 stops streaming data at around 800 seconds
> but why does initial node 2 not bump up its outgoing rate of transfer?
>
>
> Thanks for your help,
> Aadil
>
>

Mime
View raw message