cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Slater (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress
Date Mon, 10 Oct 2016 10:00:31 GMT


Ben Slater commented on CASSANDRA-12490:

Hi Benedict,

I must be missing something here because as far as I can tell from testing a few different
scenarios, setting -pop seq=1..N doesn't have any impact on the set of data generated when
used with a YAML file.

That aside, the intent is that you use the SEQ distribution for doing an initial load of background
data before running say a read test or a mixed read/write test so that you are running with
a representative volume of data on disk (and that you would probably wouldn't use SEQ for
these later tests). In that case you wouldn't expect/care whether the set of data generated
initially lines up in the same order as what is generated by later runs (although you would
expect them to be from the same overall populations of values which I believe does hold).
I believe the sequence of data generation would have to change similarly if you changed between
existing distribution types between runs?

Looking again at the code, I can see how the current implementation of  SEQ is any issue for
implementation future data validation as it doesn't "reset" as you visit each partition. 
I think the other distributions effectively rest due to the call to setSeed(). However, I
think this can fairly easily be rectified by having the setSeed() implementation of DistrubtionSequence
reset the next value to 0?


> Add sequence distribution type to cassandra stress
> --------------------------------------------------
>                 Key: CASSANDRA-12490
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Ben Slater
>            Assignee: Ben Slater
>            Priority: Minor
>             Fix For: 3.10
>         Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
> When using the write command, cassandra stress sequentially generates seeds. This ensures
generated values don't overlap (unless the sequence wraps) providing more predictable number
of inserted records (and generating a base set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. It think
it would be useful to have this for doing initial load of data for testing 

This message was sent by Atlassian JIRA

View raw message