Thanks for the quick replies guys!

Just to explain to you why I wanted to understand these two measures, I do batch inserts to Cassandra but the batches are not fixed in size i.e. the number of columns in a batch varies and also the data type of the values placed in the columns varies (the name of the columns is always a long - timestamp) => this also makes it hard to predict the actual data rate I am sending to Cassandra. I thought that if I can get a cluster wide measurement of the batch insertions per second and also of the individual column insertions per second I can understand better what's happening.

So, from what you guys said I understand that:
- the StorageProxi WriteOperations attribute gives me the batch insertions per second sent to the cluster (so this is fine)
- the Commitlog CompletedTasks attribute is definitely a closer measurement to the single column insertions but it is not accurate (i.e. it will be higher) because several types of row mutations can happen when any column is inserted - How close is this measurement to the single column insertions per second I want to obtain? Is there anything I can use to get a more accurate measurement of the single column insertions per sec or is it good enough?


On Wed, Oct 12, 2011 at 4:18 AM, Tyler Hobbs <> wrote:
The OpsCenter graph you're referring to basically does the following:

1. For each node, find out how much the WriteOperations attribute of the StorageProxy increased during the last minute.
2. Sum these values to get a total for the cluster.
3. Divide by 60 to get an average number of WriteOperations per second for the cluster.

On Tue, Oct 11, 2011 at 3:55 PM, aaron morton <> wrote:
Its the number of mutations, a mutation is a collection of changes for a single row across one or more column families. 

Take a look at the nodetool cfstats, this is where I assume Ops Centre is getting it's data from. 

Aaron Morton
Freelance Cassandra Developer

On 12/10/2011, at 3:44 AM, Alexandru Dan Sicoe wrote:

Hello everyone,
 I was trying to get some cluster wide statistics of the total insertions performed in my 3 node Cassandra 0.8.6 cluster. So I wrote a nice little program that gets the CompletedTasks attribute of org.apache.cassandra.db:type=Commitlog from every node, sums up the values and records them in a .csv every 10 sec or so. Everything works and I get my stats but later I found out that I am not really sure what this measure means. I think it is the individual column insertions performed! Am I correct?
 In the meantime I installed the trial version of the DataStax Operations Center. The cluster wide dashboard, showing Writes performed as a function of time, gives me much smaller values of the rates, compared to the measurement I described before. The Datastax writes/sec are of the same order of magnitude as the batch writes I perform on the cluster. But somehow I cannot relate between this rate and the rate of my
CompletedTasks measurement.

How do people usually measure insertion rates for their custers ? Per batch, per single columns or is actual data rate more important to know?


Tyler Hobbs
Software Engineer, DataStax
Maintainer of the pycassa Cassandra Python client library

Alexandru Dan Sicoe
MEng, CERN Marie Curie ACEOLE Fellow