cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandru Dan Sicoe <>
Subject Re: CompletedTasks attribute exposed via JMX
Date Wed, 12 Oct 2011 07:31:17 GMT
Thanks for the quick replies guys!

Just to explain to you why I wanted to understand these two measures, I do
batch inserts to Cassandra but the batches are not fixed in size i.e. the
number of columns in a batch varies and also the data type of the values
placed in the columns varies (the name of the columns is always a long -
timestamp) => this also makes it hard to predict the actual data rate I am
sending to Cassandra. I thought that if I can get a cluster wide measurement
of the batch insertions per second and also of the individual column
insertions per second I can understand better what's happening.

So, from what you guys said I understand that:
- the StorageProxi WriteOperations attribute gives me the batch insertions
per second sent to the cluster (so this is fine)
- the Commitlog CompletedTasks attribute is definitely a closer measurement
to the single column insertions but it is not accurate (i.e. it will be
higher) because several types of row mutations can happen when any column is
inserted - How close is this measurement to the single column insertions per
second I want to obtain? Is there anything I can use to get a more accurate
measurement of the single column insertions per sec or is it good enough?


On Wed, Oct 12, 2011 at 4:18 AM, Tyler Hobbs <> wrote:

> The OpsCenter graph you're referring to basically does the following:
> 1. For each node, find out how much the WriteOperations attribute of the
> StorageProxy increased during the last minute.
> 2. Sum these values to get a total for the cluster.
> 3. Divide by 60 to get an average number of WriteOperations per second for
> the cluster.
> On Tue, Oct 11, 2011 at 3:55 PM, aaron morton <>wrote:
>> Its the number of mutations, a mutation is a collection of changes for a
>> single row across one or more column families.
>> Take a look at the nodetool cfstats, this is where I assume Ops Centre is
>> getting it's data from.
>> Cheers
>>  -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> On 12/10/2011, at 3:44 AM, Alexandru Dan Sicoe wrote:
>> Hello everyone,
>>  I was trying to get some cluster wide statistics of the total insertions
>> performed in my 3 node Cassandra 0.8.6 cluster. So I wrote a nice little
>> program that gets the CompletedTasks attribute of
>> org.apache.cassandra.db:type=Commitlog from every node, sums up the values
>> and records them in a .csv every 10 sec or so. Everything works and I get my
>> stats but later I found out that I am not really sure what this measure
>> means. I think it is the individual column insertions performed! Am I
>> correct?
>>  In the meantime I installed the trial version of the DataStax Operations
>> Center. The cluster wide dashboard, showing Writes performed as a function
>> of time, gives me much smaller values of the rates, compared to the
>> measurement I described before. The Datastax writes/sec are of the same
>> order of magnitude as the batch writes I perform on the cluster. But somehow
>> I cannot relate between this rate and the rate of my CompletedTasks
>> measurement.
>> How do people usually measure insertion rates for their custers ? Per
>> batch, per single columns or is actual data rate more important to know?
>> Cheers,
>> Alexandru
> --
> Tyler Hobbs
> Software Engineer, DataStax <>
> Maintainer of the pycassa <> Cassandra
> Python client library

Alexandru Dan Sicoe
MEng, CERN Marie Curie ACEOLE Fellow

View raw message