cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8597) Stress: make simple things simple
Date Mon, 12 Jan 2015 18:26:35 GMT


Benedict commented on CASSANDRA-8597:

There is a FIXED distribution - if you want exactly 1M, why not use this? With a depth of
3, as stated, FIXED(100) for each clustering column would do this trick.

If we reenvisage the way we define the distribution, as I alluded to in #2, you could define
the total number of rows you want in the partition. But then conceptualising how those rows
are distributed amongst the clustering columns becomes hard and a different PITA. You'd need
two knobs per clustering column: the share of fan-out they should adopt, and the variance
between each value. Understanding how these interplayed with each other (both intra-tier and
inter-tier) would be really quite difficult for people to think about, which is why I originally
chose to let it be configured by clustering column. It does, however, also solve your problem
#2. It's a more powerful way of specifying, but I'm concerned that stress is already considered
difficult to understand.

> Stress: make simple things simple
> ---------------------------------
>                 Key: CASSANDRA-8597
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Jonathan Ellis
>            Assignee: T Jake Luciani
>             Fix For: 2.1.3
> Some of the trouble people have with stress is a documentation problem, but some is functional.
> Comments from [~iamaleksey]:
> # 3 clustering columns, make a million cells in a single partition, should be simple,
but it's not. have to tweak 'clustering' on the three columns just right to make stress work
at all. w/ some values it'd just gets stuck forever computing batches
> # for others, it generates huge, megabyte-size batches, utterly disrespecting 'select'
clause in 'insert'
> #  I want a sequential generator too, to be able to predict deterministic result sets.
uniform() only gets you so far
> # impossible to simulate a time series workload
> /cc [~jshook] [~aweisberg] [~benedict]

This message was sent by Atlassian JIRA

View raw message