Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Sun, 9 Oct 2016 17:22:20 +0000 (UTC)
From: "Benedict (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12998165.1471518195000.777345.1476033740461@Atlassian.JIRA>
In-Reply-To: <JIRA.12998165.1471518195000@Atlassian.JIRA>
References: <JIRA.12998165.1471518195000@Atlassian.JIRA> <JIRA.12998165.1471518195745@arcas>
Subject: [jira] [Commented] (CASSANDRA-12490) Add sequence distribution type
 to cassandra stress
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Sun, 09 Oct 2016 17:22:22 -0000


    [ https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15560318#comment-15560318 ] 

Benedict commented on CASSANDRA-12490:
--------------------------------------

I'm afraid I think this was a terrible idea, and it should probably be rolled back.  The example yaml permits its use as a column value seed generator, which means the contents of a partition no longer depend on the partition's seed, but on the order of visitation.  

For partition and clustering columns (as in the example) this breaks behaviour for queries.  Stress no longer knows what records exist (it will generate different values to query than it originally wrote).

It also completely breaks any possibility of data validation, which is currently supported for thrift and always intended to be extending to CQL to improve testing. 

As already mentioned, the -pop seq=1..N mode can be provided on the command line for sequentially visiting partitions.  For generating *values* that can step forwards with this, the most sensible design (and what had been on the cards) is to accept a functional specification that depends on the seed of the partition, the simplest being to return 1 when the partition's seed was 1.

> Add sequence distribution type to cassandra stress
> --------------------------------------------------
>
>                 Key: CASSANDRA-12490
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Ben Slater
>            Assignee: Ben Slater
>            Priority: Minor
>             Fix For: 3.10
>
>         Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. This ensures generated values don't overlap (unless the sequence wraps) providing more predictable number of inserted records (and generating a base set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. It think it would be useful to have this for doing initial load of data for testing 


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)