cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anuj Wadehra <anujw_2...@yahoo.co.in>
Subject Re: Throttling Cassandra Load
Date Mon, 28 Sep 2015 04:23:10 GMT
Hi,


Any suggestions/comments on approach ? What you guys are doing to keep check on misbehaved
clients and restrict Cassandra load.



Note: We will be moving to CQL driver but that will take months. 

Anuj

Sent from Yahoo Mail on Android

From:"Anuj Wadehra" <anujw_2003@yahoo.co.in>
Date:Wed, 23 Sep, 2015 at 1:36 am
Subject:Throttling Cassandra Load

Hi,

We are using Cassandra 2.0.14 with Hector 1.1.4. Each node in cluster has an application using
Hector and a Cassandra instance.

I want suggestions on the approach we are taking for throttling Cassandra load. 

Problem Statement: 
Misbehaved clients can bring down Cassandra clusters by putting excessive load. We want to
prevent overloading of Cassandra cluster.

Solution Proposed:
1.  Run a Test for each application scenario involving Cassandra. Keep on putting more requests
in each application Scenario till performance starts deteriorating for the scenario and note
the max connection achieved during the tests as follows:

For Example: 
Scenario A=60 
Scenario B=70
Scneario C=90

Set rpc_max_threads= max(All scenarios)=90

2. In Hector, set MaxActive connections per host=90 

3. As Hector maintains connections PER HOST, Number of open connections by a Hector client
on a node increases with cluster size.

e.g. On a 3 node cluster, each Hector client will open total of 90 * 3 connections
      On a 15 node cluster, each Hector client will open total of 90 * 15 connections

So, we have set rpc_server_type=hsha to support large client connections. Not sure whether
https://issues.apache.org/jira/i#browse/CASSANDRA-7309 is a concern??

4. At application level, we check cluster load by ADDING active connections created by Hector
on EACH node of cluster. If they are already around 95% of ( 90 * (num of Nodes)),we reject
tasks to prevent overload.

5. We see that Hector only closes idle connections when borrowing clients from pool .And immediately
after closing idle connections, it creates a new one. So, if active connections increase they
seldom go down and remain open(except in few exception scenarios). So, we cant rely on ThriftClients
JMX metrics by Cassandra to know ACTIVE connections. ThriftClients show open connections rather
than active.Is there a better way to know active Cassandra connections on a Cassandra node??
or check Cassandra load to prevent more tasks if a node is already overloaded?


I am looking for suggestions on above approach and more ideas on throttling Cassandra load
?

Thanks
Anuj


Mime
View raw message