cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress
Date Sun, 09 Oct 2016 17:22:20 GMT


Benedict commented on CASSANDRA-12490:

I'm afraid I think this was a terrible idea, and it should probably be rolled back.  The example
yaml permits its use as a column value seed generator, which means the contents of a partition
no longer depend on the partition's seed, but on the order of visitation.  

For partition and clustering columns (as in the example) this breaks behaviour for queries.
 Stress no longer knows what records exist (it will generate different values to query than
it originally wrote).

It also completely breaks any possibility of data validation, which is currently supported
for thrift and always intended to be extending to CQL to improve testing. 

As already mentioned, the -pop seq=1..N mode can be provided on the command line for sequentially
visiting partitions.  For generating *values* that can step forwards with this, the most sensible
design (and what had been on the cards) is to accept a functional specification that depends
on the seed of the partition, the simplest being to return 1 when the partition's seed was

> Add sequence distribution type to cassandra stress
> --------------------------------------------------
>                 Key: CASSANDRA-12490
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Ben Slater
>            Assignee: Ben Slater
>            Priority: Minor
>             Fix For: 3.10
>         Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
> When using the write command, cassandra stress sequentially generates seeds. This ensures
generated values don't overlap (unless the sequence wraps) providing more predictable number
of inserted records (and generating a base set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. It think
it would be useful to have this for doing initial load of data for testing 

This message was sent by Atlassian JIRA

View raw message