Thanks for the quick replies guys!
Just to explain to you why I wanted to understand these two measures, I do batch inserts to Cassandra but the batches are not fixed in size i.e. the number of columns in a batch varies and also the data type of the values placed in the columns varies (the name of the columns is always a long - timestamp) => this also makes it hard to predict the actual data rate I am sending to Cassandra. I thought that if I can get a cluster wide measurement of the batch insertions per second and also of the individual column insertions per second I can understand better what's happening.
So, from what you guys said I understand that:
- the StorageProxi WriteOperations attribute gives me the batch insertions per second sent to the cluster (so this is fine)
- the Commitlog CompletedTasks attribute is definitely a closer measurement to the single column insertions but it is not accurate (i.e. it will be higher) because several types of row mutations can happen when any column is inserted - How close is this measurement to the single column insertions per second I want to obtain? Is there anything I can use to get a more accurate measurement of the single column insertions per sec or is it good enough?
The OpsCenter graph you're referring to basically does the following:
1. For each node, find out how much the WriteOperations attribute of the StorageProxy increased during the last minute.
2. Sum these values to get a total for the cluster.
3. Divide by 60 to get an average number of WriteOperations per second for the cluster.--On Tue, Oct 11, 2011 at 3:55 PM, aaron morton <email@example.com> wrote:
Its the number of mutations, a mutation is a collection of changes for a single row across one or more column families.Take a look at the nodetool cfstats, this is where I assume Ops Centre is getting it's data from.CheersOn 12/10/2011, at 3:44 AM, Alexandru Dan Sicoe wrote:Hello everyone,
I was trying to get some cluster wide statistics of the total insertions performed in my 3 node Cassandra 0.8.6 cluster. So I wrote a nice little program that gets the CompletedTasks attribute of org.apache.cassandra.db:type=Commitlog from every node, sums up the values and records them in a .csv every 10 sec or so. Everything works and I get my stats but later I found out that I am not really sure what this measure means. I think it is the individual column insertions performed! Am I correct?
In the meantime I installed the trial version of the DataStax Operations Center. The cluster wide dashboard, showing Writes performed as a function of time, gives me much smaller values of the rates, compared to the measurement I described before. The Datastax writes/sec are of the same order of magnitude as the batch writes I perform on the cluster. But somehow I cannot relate between this rate and the rate of my CompletedTasks measurement.
How do people usually measure insertion rates for their custers ? Per batch, per single columns or is actual data rate more important to know?
Software Engineer, DataStax
Maintainer of the pycassa Cassandra Python client library