cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)" <bjano...@cisco.com>
Subject Re: Cluster scaling
Date Wed, 08 Feb 2017 21:55:44 GMT
Hi Anuj,

Thank you for the response. I modify RF just to see what is the effect on the performance,
there is no data in the datastore when I change its value. But I see my mistake and will definitely
change it like you mentioned.
Reading will not be used that much, we will mostly write into the datastore. And the batching
needs some work so the numbers will change, sorry about that.

Just wanted to see what is this “linear scalability”, because when I do the reading operation
as well I don’t see the scaling. So is the scaling based on only writing into datastore?

My main question for everybody probably is: Do these numbers seem reasonable to you?

Cheers,
Branislav


From: Anuj Wadehra <anujw_2003@yahoo.co.in>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>, "anujw_2003@yahoo.co.in"
<anujw_2003@yahoo.co.in>
Date: Wednesday, February 8, 2017 at 9:42 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>, "Branislav Janosik -T (bjanosik
- AAP3 INC at Cisco)" <bjanosik@cisco.com>, "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Cluster scaling

Hi Branislav,

I quickly went through the code and noticed that you are updating RF from code and expecting
that Cassandra would automatically distribute replicas as per the new RF. I think this is
not how it works. After updating the RF, you need to run repair on all the nodes to make sure
that data replicas are as per the new RF. Please refer to https://docs.datastax.com/en/cql/3.1/cql/cql_using/update_ks_rf_t.html
. This would give you reliable results.

It would be good if you explain the exact purpose of your exercise. Tests seem more in academic
interest. You are adding several variables in your tests but each of these params have entirely
different purpose:

1. Batch/No Batch depends on business atomicity needs.

2. Read/ No read is dependent on business requirement

3. RF depends on fault tolerance needed


Thanks
Anuj


On Wed, 8 Feb, 2017 at 9:09 PM, Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)
<bjanosik@cisco.com> wrote:

Hi all,



I have a cluster of three nodes and would like to ask some questions about the performance.

I wrote a small benchmarking tool in java that mirrors (read, write) operations that we do
in the real project.

Problem is that it is not scaling like it should. The program runs two tests: one using batch
statement and one without using the batch.

The operation sequence is: optional select, insert, update, insert. I run the tool on my server
with 128 threads (# of threads has no influence on the performance),

creating usually 100K resources for testing purposes.



The average results (operations per second) with the use of batch statement are:



Replication Factor = 1       with reading        without reading

                1-node cluster     37K                         46K

                2-node cluster     37K                         47K

                3-node cluster     39K                         70K



Replication Factor = 2       with reading        without reading

                2-node cluster     21K                         40K

                3-node cluster     30K                         48K



The average results (operations per second) without the use of batch statement are:



Replication Factor = 1       with reading        without reading

                1-node cluster     31K                         20K

                2-node cluster     38K                         39K

                3-node cluster     45K                         87K



Replication Factor = 2       with reading        without reading

                2-node cluster     19K                         22K

                3-node cluster     26K                         36K



The Cassandra VMs specs are: 16 CPUs,  16GB and two 32GB of RAM, at least 30GB of disk space
for each node. Non SSD, each VM is on separate physical server.



The code is available here https://github.com/bjanosik/CassandraBenchTool.git . It can be
built with Maven and then you can use jar in target directory with java -jar target/cassandra-test-1.0-SNAPSHOT-jar-with-dependencies.jar
.

Thank you for any help.

Mime
View raw message