cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Luciani <jak...@gmail.com>
Subject Re: Cassandra Stress Test Result Evaluation
Date Mon, 09 Mar 2015 15:38:51 GMT
Your insert settings look unrealistic since I doubt you would be
writing 50k rows at a time.  Try to set this to 1 per partition and
you should get much more consistent numbers across runs I would think.
select: fixed(1)/100000

On Wed, Mar 4, 2015 at 7:53 AM, Nisha Menon <nisha.menon16@gmail.com> wrote:
> I have been using the cassandra-stress tool to evaluate my cassandra cluster
> for quite some time now. My problem is that I am not able to comprehend the
> results generated for my specific use case.
>
> My schema looks something like this:
>
> CREATE TABLE Table_test(
>       ID uuid,
>       Time timestamp,
>       Value double,
>       Date timestamp,
>       PRIMARY KEY ((ID,Date), Time)
> ) WITH COMPACT STORAGE;
>
> I have parsed this information in a custom yaml file and used parameters
> n=10000, threads=100 and the rest are default options (cl=one, mode=native
> cql3 etc). The Cassandra cluster is a 3 node CentOS VM setup.
>
> A few specifics of the custom yaml file are as follows:
>
> insert:
>     partitions: fixed(100)
>     select: fixed(1)/2
>     batchtype: UNLOGGED
>
> columnspecs:
>     -name: Time
>      size: fixed(1000)
>     -name: ID
>      size: uniform(1..100)
>     -name: Date
>      size: uniform(1..10)
>     -name: Value
>      size: uniform(-100..100)
>
> My observations so far are as follows (Please correct me if I am wrong):
>
> With n=10000 and time: fixed(1000), the number of rows getting inserted is
> 10 million. (10000*1000=10000000)
> The number of row-keys/partitions is 10000(i.e n), within which 100
> partitions are taken at a time (which means 100 *1000 = 100000 key-value
> pairs) out of which 50000 key-value pairs are processed at a time. (This is
> because of select: fixed(1)/2 ~ 50%)
>
> The output message also confirms the same:
>
> Generating batches with [100..100] partitions and [50000..50000] rows
> (of[100000..100000] total rows in the partitions)
>
> The results that I get are the following for consecutive runs with the same
> configuration as above:
>
> Run Total_ops   Op_rate Partition_rate  Row_Rate   Time
> 1     56           19     1885           943246     3.0
> 2     46           46     4648          2325498     1.0
> 3     27           30     2982          1489870     0.9
> 4     59           19     1932           966034     3.1
> 5     100          17     1730           865182     5.8
>
> Now what I need to understand are as follows:
>
> Which among these metrics is the throughput i.e, No. of records inserted per
> second? Is it the Row_rate, Op_rate or Partition_rate? If it’s the Row_rate,
> can I safely conclude here that I am able to insert close to 1 million
> records per second? Any thoughts on what the Op_rate and Partition_rate mean
> in this case?
> Why is it that the Total_ops vary so drastically in every run ? Has the
> number of threads got anything to do with this variation? What can I
> conclude here about the stability of my Cassandra setup?
> How do I determine the batch size per thread here? In my example, is the
> batch size 50000?
>
> Thanks in advance.



-- 
http://twitter.com/tjake

Mime
View raw message