cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Stupp <sn...@snazy.de>
Subject Re: Could ring cache really improve performance in Cassandra?
Date Mon, 08 Dec 2014 08:42:02 GMT
cassandra-stress is a great tool to check whether the sizing of your cluster in combination
of your data model will fit your production needs. I.e. without the application :) Removing
the application removes any possible bugs from the load test. Sure, it’s a necessary step
to do it with your application - but I’d recommend to start with the stress test tool first.

Thrift is a deprecated API. I strongly recommend to use the C++ driver (I pretty sure it supports
the native protocol). The native protocol achieves approx. twice the performance than thrift
via much fewer TCP connections. (Thrift is RPC - means connections usually waste system, application
and server resources while waiting for something. Native protocol is a multiplexed protocol.)
As John already said, all development effort is spent on CQL3 and native protocol - thift
is just "supported".

With CQL you can you everything that you can do with thrift + more, new stuff.

I also recommend to use prepared statements (it automagically works in a distributed cluster
with the native protocol) - it eliminates the effort to parse CQL statement again and again.


> Am 08.12.2014 um 09:26 schrieb 孔嘉林 <kongjialin92@gmail.com>:
> 
> Thanks Jonathan, actually I'm wondering how CQL is implemented underlying, a different
RPC mechanism? Why it is faster than thrift? I know I'm wrong, but now I just regard CQL as
a query language. Could you please help explain to me? I still feel puzzled after reading
some docs about CQL. I create table in CQL, and use cql3 API in thrift. I don't know what
else I can do with CQL. And I am using C++ to write the client side code. Currently I am not
using the C++ driver and want to write some simple functionality by myself. 
> 
> Also, I didn't use the stress test tool provided in the Cassandra distribution because
I also want to make sure whether I can achieve good performance as excepted using my client
code. I know others have benchmarked Cassandra and got good results. But if I cannot reproduce
the satisfactory results, I cannot use it in my case.
> 
> I will create a repo and send a link later, hope to get your kind help.
> 
> Thanks very much.
> 
> 2014-12-08 14:28 GMT+08:00 Jonathan Haddad <jon@jonhaddad.com <mailto:jon@jonhaddad.com>>:
> I would really not recommend using thrift for anything at this point, including your
load tests.  Take a look at CQL, all development is going there and has in 2.1 seen a massive
performance boost over 2.0.
> 
> You may want to try the Cassandra stress tool included in 2.1, it can stress a table
you've already built.  That way you can rule out any bugs on the client side.  If you're going
to keep using your tool, however, it would be helpful if you sent out a link to the repo,
since currently we have no way of knowing if you've got a client side bug (data model or code)
that's limiting your performance.
> 
> 
> On Sun Dec 07 2014 at 7:55:16 PM 孔嘉林 <kongjialin92@gmail.com <mailto:kongjialin92@gmail.com>>
wrote:
> I find under the src/client folder of Cassandra 2.1.0 source code, there is a RingCache.java
file. It uses a thrift client calling the describe_ring() API to get the token range of each
Cassandra node. It is used on the client side. The client can use it combined with the partitioner
to get the target node. In this way there is no need to route requests between Cassandra nodes,
and the client can directly connect to the target node. So maybe it can save some routing
time and improve performance.
> Thank you very much.
> 
> 2014-12-08 1:28 GMT+08:00 Jonathan Haddad <jon@jonhaddad.com <mailto:jon@jonhaddad.com>>:
> What's a ring cache?
> 
> FYI if you're using the DataStax CQL drivers they will automatically route requests to
the correct node.
> 
> On Sun Dec 07 2014 at 12:59:36 AM kong <kongjialin92@gmail.com <mailto:kongjialin92@gmail.com>>
wrote:
> Hi,
> 
> I'm doing stress test on Cassandra. And I learn that using ring cache can improve the
performance because the client requests can directly go to the target Cassandra server and
the coordinator Cassandra node is the desired target node. In this way, there is no need for
coordinator node to route the client requests to the target node, and maybe we can get the
linear performance increment.
> 
>  
> 
> However, in my stress test on an Amazon EC2 cluster, the test results are weird. Seems
that there's no performance improvement after using ring cache. Could anyone help me explain
this results? (Also, I think the results of test without ring cache is weird, because there's
no linear increment on QPS when new nodes are added. I need help on explaining this, too).
The results are as follows:
> 
>  
> 
> INSERT(write):
> 
> Node count
> 
> Replication factor
> 
> QPS(No ring cache)
> 
> QPS(ring cache)
> 
> 1
> 
> 1
> 
> 18687
> 
> 20195
> 
> 2
> 
> 1
> 
> 20793
> 
> 26403
> 
> 2
> 
> 2
> 
> 22498
> 
> 21263
> 
> 4
> 
> 1
> 
> 28348
> 
> 30010
> 
> 4
> 
> 3
> 
> 28631
> 
> 24413
> 
>  
> 
> SELECT(read):
> 
> Node count
> 
> Replication factor
> 
> QPS(No ring cache)
> 
> QPS(ring cache)
> 
> 1
> 
> 1
> 
> 24498
> 
> 22802
> 
> 2
> 
> 1
> 
> 28219
> 
> 27030
> 
> 2
> 
> 2
> 
> 35383
> 
> 36674
> 
> 4
> 
> 1
> 
> 34648
> 
> 28347
> 
> 4
> 
> 3
> 
> 52932
> 
> 52590
> 
>  
> 
>  
> 
> Thank you very much,
> 
> Joy
> 
> 
> 


Mime
View raw message