cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "T Jake Luciani (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7519) Further stress improvements to generate more realistic workloads
Date Sun, 24 Aug 2014 01:35:11 GMT


T Jake Luciani commented on CASSANDRA-7519:

Ran some tests and tweaked the schema from the blogpost and things look better.  I do have
some further questions/suggestions besides the better names.

- What is the point of batchcount?  The point of a batch is to group the inserts into a single
statement for the server, so why would you send multiple of these sequentially? Even though
it's possible I can't think of a realistic workload that would use it.

- I think it would be helpful to output some information on the partition sizes and batch
sizes for inserts to give people a sense of what their selected values will do, like:

  Partitions: Min of X, Max of Y  
  Rows per partition:  Min of X,  Max of Y 

Per Batch:
  Partitions: Min of X, Max of Y
  Rows per partition: Min of X, Max of Y

> Further stress improvements to generate more realistic workloads
> ----------------------------------------------------------------
>                 Key: CASSANDRA-7519
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: Minor
>              Labels: tools
>             Fix For: 2.1.1
> We generally believe that the most common workload is for reads to exponentially prefer
most recently written data. However as stress currently behaves we have two id generation
modes: sequential and random (although random can be distributed). I propose introducing a
new mode which is somewhat like sequential, except we essentially 'look back' from the current
id by some amount defined by a distribution. I may possibly make the position only increment
as it's first written to also, so that this mode can be run from a clean slate with a mixed
workload. This should allow is to generate workloads that are more representative.
> At the same time, I will introduce a timestamp value generator for primary key columns
that is strictly ascending, i.e. has some random component but is based off of the actual
system time (or some shared monotonically increasing state) so that we can again generate
a more realistic workload. This may be challenging to tie in with the new procedurally generated
partitions, but I'm sure it can be done without too much difficulty.

This message was sent by Atlassian JIRA

View raw message