incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Important Variables for Scaling
Date Thu, 16 Jun 2011 12:00:08 GMT
It's a difficult questions to answer in the abstract. Some thoughts...

Scaling by adding one node at  time is not optimal. The best case scenario is to double the
number of nodes, as this means existing nodes only have to stream their data to a new node.
Obviously this is not always possible. When adding less nodes existing nodes keeping a balanced
ring may mean streams data to other nodes and accepting data from nodes.  

In general try to keep the data volume < 50% full, so there is lots of free space to do
moves. 

In general nodes with a few 100GB's of data are easiest to manage. 

The pending column in nodetool tpstats will let you know how many read or write requests are
waiting to be serviced. If this is consistently above concurrent_reads or concurrent_writes
it means there is a queue > 1 for each thread. This will add to request latency, once maximum
throughput is reached additional requests will queue. See the SEDA paper.  

Sometime in the 0.7 dev client connection pooling was added to better manage those resources.
See cassandra.yaml for info.  

The o.a.c.db.StorageProxy JMX MBean provides a latency trackers for total request time including
wait times. And o.a.c.db.ColumnFamiles... provides latency trackers for the local node operations
to do the read or write.

if your data set grows quickly watch the disk space etc. If you do a lot of requests but you
data grows slowly watch the throughout and latency numbers. 

 
Hope that helps.   
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 16 Jun 2011, at 18:28, Schuilenga, Jan Taeke wrote:

> Which variables (for instance: throughput, CPU, I/O, connections) are leading in deciding
to add a node to a Cassandra setup which is put under strain. We are trying to proove scalibility,
but when is the time there to add a node and have the optimum scalibilty result.
> 


Mime
View raw message