cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandru Dan Sicoe <sicoe.alexan...@googlemail.com>
Subject Re: CompletedTasks attribute exposed via JMX
Date Wed, 12 Oct 2011 07:31:17 GMT
Thanks for the quick replies guys!

Just to explain to you why I wanted to understand these two measures, I do
batch inserts to Cassandra but the batches are not fixed in size i.e. the
number of columns in a batch varies and also the data type of the values
placed in the columns varies (the name of the columns is always a long -
timestamp) => this also makes it hard to predict the actual data rate I am
sending to Cassandra. I thought that if I can get a cluster wide measurement
of the batch insertions per second and also of the individual column
insertions per second I can understand better what's happening.

So, from what you guys said I understand that:
- the StorageProxi WriteOperations attribute gives me the batch insertions
per second sent to the cluster (so this is fine)
- the Commitlog CompletedTasks attribute is definitely a closer measurement
to the single column insertions but it is not accurate (i.e. it will be
higher) because several types of row mutations can happen when any column is
inserted - How close is this measurement to the single column insertions per
second I want to obtain? Is there anything I can use to get a more accurate
measurement of the single column insertions per sec or is it good enough?

Cheers,
Alexandru

On Wed, Oct 12, 2011 at 4:18 AM, Tyler Hobbs <tyler@datastax.com> wrote:

> The OpsCenter graph you're referring to basically does the following:
>
> 1. For each node, find out how much the WriteOperations attribute of the
> StorageProxy increased during the last minute.
> 2. Sum these values to get a total for the cluster.
> 3. Divide by 60 to get an average number of WriteOperations per second for
> the cluster.
>
>
> On Tue, Oct 11, 2011 at 3:55 PM, aaron morton <aaron@thelastpickle.com>wrote:
>
>> Its the number of mutations, a mutation is a collection of changes for a
>> single row across one or more column families.
>>
>> Take a look at the nodetool cfstats, this is where I assume Ops Centre is
>> getting it's data from.
>>
>> Cheers
>>
>>  -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 12/10/2011, at 3:44 AM, Alexandru Dan Sicoe wrote:
>>
>> Hello everyone,
>>  I was trying to get some cluster wide statistics of the total insertions
>> performed in my 3 node Cassandra 0.8.6 cluster. So I wrote a nice little
>> program that gets the CompletedTasks attribute of
>> org.apache.cassandra.db:type=Commitlog from every node, sums up the values
>> and records them in a .csv every 10 sec or so. Everything works and I get my
>> stats but later I found out that I am not really sure what this measure
>> means. I think it is the individual column insertions performed! Am I
>> correct?
>>  In the meantime I installed the trial version of the DataStax Operations
>> Center. The cluster wide dashboard, showing Writes performed as a function
>> of time, gives me much smaller values of the rates, compared to the
>> measurement I described before. The Datastax writes/sec are of the same
>> order of magnitude as the batch writes I perform on the cluster. But somehow
>> I cannot relate between this rate and the rate of my CompletedTasks
>> measurement.
>>
>> How do people usually measure insertion rates for their custers ? Per
>> batch, per single columns or is actual data rate more important to know?
>>
>> Cheers,
>> Alexandru
>>
>>
>>
>
>
> --
> Tyler Hobbs
> Software Engineer, DataStax <http://datastax.com/>
> Maintainer of the pycassa <http://github.com/pycassa/pycassa> Cassandra
> Python client library
>
>


-- 
Alexandru Dan Sicoe
MEng, CERN Marie Curie ACEOLE Fellow

Mime
View raw message