cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From S G <sg.online.em...@gmail.com>
Subject Re: How can I scale my read rate?
Date Mon, 20 Mar 2017 01:35:30 GMT
I am not using prepared statements. There are two reasons for that:

1) https://issues.apache.org/jira/browse/CASSANDRA-3634 tells me that the
performance improvements with prepared statements are capped at about 20% -
so I would not see a drastic difference.
2)
https://docs.datastax.com/en/developer/java-driver/3.1/manual/statements/prepared/
tells me to avoid preparing select queries if I expect a change of columns
in my table down the road.

So I wasn't very keen on trying those out.
Please suggest if I can expect to get more than 20% performance using the
same and I will try that too.


I did some more testing to see if my client machines were the bottleneck.
For a 6-node Cassandra cluster (each VM having 8-cores), I got 26,000
reads/sec for all of the following:
1) Client nodes:1, Threads: 60
2) Client nodes:3, Threads: 180
3) Client nodes:5, Threads: 300
4) Client nodes:10, Threads: 600
5) Client nodes:20, Threads: 1200

So adding more client nodes or threads to those client nodes is not having
any effect.
I am suspecting Cassandra is simply not allowing me to go any further.

Reads/second is measured both by the opscenter and by my own logic in the
client machines code. They seem to agree more or less.

Primary keys for my schema are:
    PRIMARY KEY((name, phone), age)
name: text
phone: int
age: int

And there are very few columns in the table apart from the above. None of
them is more than few bytes long.

1) Does Cassandra have something like debug=timing
<http://stackoverflow.com/questions/14712175/splitting-solr-response-time>
in Solr which gives all the information about where query-time was spent?
2) Does the nodetool have something that can give me some hint?
3) Is there a JMX metrics I should look for?


Thanks
SG





On Sun, Mar 19, 2017 at 5:03 AM, James Carman <james@carmanconsulting.com>
wrote:

> Have you tried using PreparedStatements?
>
> On Sat, Mar 18, 2017 at 9:47 PM S G <sg.online.email@gmail.com> wrote:
>
>> ok, I gave the executeAsync() a try.
>> Good part is that it was really easy to write the code for that.
>> Bad part is that it did not had a huge effect on my throughput - I gained
>> about 5% increase in throughput.
>> I suspect it is so because my queries are all get-by-primary-key queries
>> and were anyways completing in less than 2 milliseconds.
>> So there was not much wait to begin with.
>>
>>
>> Here is my code:
>>
>> String getByKeyQueryStr = "Select * from fooTable where key = " + key;
>> //ResultSet result = session.execute(getByKeyQueryStr);  // Previous code
>> ResultSetFuture future = session.executeAsync(getByKeyQueryStr);
>> FutureCallback<ResultSet> callback = new MyFutureCallback();
>> executor = MoreExecutors.sameThreadExecutor();
>> //executor = Executors.newFixedThreadPool(3); // Tried this too, no
>> effect
>> //executor = Executors.newFixedThreadPool(10); // Tried this too, no
>> effect
>> Futures.addCallback(future, callback, executor);
>>
>> Can I improve the above code in some way?
>> Are there any JMX metrics that can tell me what's going on?
>>
>> From the vmstat command, I see that CPU idle time is about 70% even
>> though I am running about 60 threads per VM
>> Total 20 client-VMs with 8 cores each are querying a Cassandra cluster
>> with 16 VMs, 8-core each too.
>>
>> [image: Screen Shot 2017-03-18 at 6.46.03 PM.png]
>> ​
>> ​
>>
>>
>> Thanks
>> SG
>>
>>
>> On Sat, Mar 18, 2017 at 5:38 PM, S G <sg.online.email@gmail.com> wrote:
>>
>> Thanks. It seems that you guys have found executeAsync to yield good
>> results.
>> I want to share my understanding how this could benefit performance and
>> some validation from the group will be awesome.
>>
>> I will call executeAsync() each time I want to get by primary-key.
>> That ways, my client thread is not blocked anymore and I can submit a lot
>> more requests per unit time.
>> The async requests get piled on the underlying Netty I/O thread which
>> ensures that it is always busy all the time.
>> Earlier, the Netty I/O thread would have wasted some cycles when the
>> sync-execute method was processing the results.
>> And earlier, the client thread would also have wasted some cycles waiting
>> for netty-thread to complete.
>>
>> With executeAsync(), none of them is waiting.
>> Only thing to ensure is that the Netty thread's queue does not grow
>> indefinitely.
>>
>> If the above theory is correct, then it sounds like a really good thing
>> to try.
>> If not, please do share some more details.
>>
>>
>>
>>
>> On Sat, Mar 18, 2017 at 2:00 PM, <j.kesten@enercast.de> wrote:
>>
>> +1 for executeAsync – had a long time to argue that it’s not bad as with
>> good old rdbms.
>>
>>
>>
>>
>>
>>
>>
>> Gesendet von meinem Windows 10 Phone
>>
>>
>>
>> *Von: *Arvydas Jonusonis <arvydas.jonusonis@gmail.com>
>> *Gesendet: *Samstag, 18. März 2017 19:08
>> *An: *user@cassandra.apache.org
>> *Betreff: *Re: How can I scale my read rate?
>>
>>
>>
>> ..then you're not taking advantage of request pipelining. Use
>> executeAsync - this will increase your throughput for sure.
>>
>>
>>
>> http://www.datastax.com/dev/blog/java-driver-async-queries
>>
>>
>>
>>
>>
>> On Sat, Mar 18, 2017 at 08:00 S G <sg.online.email@gmail.com> wrote:
>>
>> I have enabled JMX but not sure what metrics to look for - they are way
>> too many of them.
>>
>> I am using session.execute(...)
>>
>>
>>
>>
>>
>> On Fri, Mar 17, 2017 at 2:07 PM, Arvydas Jonusonis <
>> arvydas.jonusonis@gmail.com> wrote:
>>
>> It would be interesting to see some of the driver metrics (in your stress
>> test tool) - if you enable JMX, they should be exposed by default.
>>
>> Also, are you using session.execute(..) or session.executeAsync(..) ?
>>
>>
>>
>>
>>
>>
>>

Mime
View raw message