cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: CompletedTasks attribute exposed via JMX
Date Wed, 12 Oct 2011 21:31:00 GMT
Storage proxy will give you the total writes through the server, for all CFs. 

CommitLog thread pool is not what you want. It's not designed to measure the column or row
throughput, it's just how many tasks have run through the thread pool.

The closest thing to recording the number of columns is the MemtableColumnCount in the per
CF stats in JMX (and cfinfo in nodetool). It is updated here

- it only counts top level columns, not sub columns
- it includes deletes
- it is per Memtable, so it is cleared when a new memtable is switched in. 
- the number is also included in the logs when the memtable is flushed  

Hope that helps. 

Aaron Morton
Freelance Cassandra Developer

On 12/10/2011, at 8:31 PM, Alexandru Dan Sicoe wrote:

> Thanks for the quick replies guys!
> Just to explain to you why I wanted to understand these two measures, I do batch inserts
to Cassandra but the batches are not fixed in size i.e. the number of columns in a batch varies
and also the data type of the values placed in the columns varies (the name of the columns
is always a long - timestamp) => this also makes it hard to predict the actual data rate
I am sending to Cassandra. I thought that if I can get a cluster wide measurement of the batch
insertions per second and also of the individual column insertions per second I can understand
better what's happening. 
> So, from what you guys said I understand that:
> - the StorageProxi WriteOperations attribute gives me the batch insertions per second
sent to the cluster (so this is fine)
> - the Commitlog CompletedTasks attribute is definitely a closer measurement to the single
column insertions but it is not accurate (i.e. it will be higher) because several types of
row mutations can happen when any column is inserted - How close is this measurement to the
single column insertions per second I want to obtain? Is there anything I can use to get a
more accurate measurement of the single column insertions per sec or is it good enough?
> Cheers,
> Alexandru
> On Wed, Oct 12, 2011 at 4:18 AM, Tyler Hobbs <> wrote:
> The OpsCenter graph you're referring to basically does the following:
> 1. For each node, find out how much the WriteOperations attribute of the StorageProxy
increased during the last minute.
> 2. Sum these values to get a total for the cluster.
> 3. Divide by 60 to get an average number of WriteOperations per second for the cluster.
> On Tue, Oct 11, 2011 at 3:55 PM, aaron morton <> wrote:
> Its the number of mutations, a mutation is a collection of changes for a single row across
one or more column families. 
> Take a look at the nodetool cfstats, this is where I assume Ops Centre is getting it's
data from. 
> Cheers
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> On 12/10/2011, at 3:44 AM, Alexandru Dan Sicoe wrote:
>> Hello everyone,
>>  I was trying to get some cluster wide statistics of the total insertions performed
in my 3 node Cassandra 0.8.6 cluster. So I wrote a nice little program that gets the CompletedTasks
attribute of org.apache.cassandra.db:type=Commitlog from every node, sums up the values and
records them in a .csv every 10 sec or so. Everything works and I get my stats but later I
found out that I am not really sure what this measure means. I think it is the individual
column insertions performed! Am I correct?
>>  In the meantime I installed the trial version of the DataStax Operations Center.
The cluster wide dashboard, showing Writes performed as a function of time, gives me much
smaller values of the rates, compared to the measurement I described before. The Datastax
writes/sec are of the same order of magnitude as the batch writes I perform on the cluster.
But somehow I cannot relate between this rate and the rate of my CompletedTasks measurement.
>> How do people usually measure insertion rates for their custers ? Per batch, per
single columns or is actual data rate more important to know?
>> Cheers,
>> Alexandru
> -- 
> Tyler Hobbs
> Software Engineer, DataStax
> Maintainer of the pycassa Cassandra Python client library
> -- 
> Alexandru Dan Sicoe
> MEng, CERN Marie Curie ACEOLE Fellow

View raw message