Thanks. 

On Mon, May 27, 2019 at 12:24 PM Alain RODRIGUEZ <arodrime@gmail.com> wrote:
Hello Carl,

What you try to do sounds like a good match with one of the tool we open-sourced and actively maintain: https://github.com/thelastpickle/tlp-stress.

TLP Stress allows you to use defined profiles (see https://github.com/thelastpickle/tlp-stress/tree/master/src/main/kotlin/com/thelastpickle/tlpstress/profiles) or create your own profiles and/or schemas. Contributions are welcome. You can tune workloads, the read/write ratio, the number of distinct partitions, number of operations to run...

You might need multiple client to maximize the throughput, depending on instances in use and your own testing goals.

version specific stuff to 2.1, 2.2, 3.x, 4.x
 
In case that might be of some use as well, we like to use it combined with another of our tools: TLP Cluster (https://github.com/thelastpickle/tlp-cluster). We can the easily create and destroy Cassandra environments (on AWS) including Cassandra servers, client and monitoring (Prometheus).

You can have a look anyway, I think both projects might be of interest to reach your goal.

C*heers,
-----------------------
Alain Rodriguez - alain@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting


Le jeu. 23 mai 2019 à 21:25, Carl Mueller <carl.mueller@smartthings.com.invalid> a écrit :
Does anyone have any schema / schema generation that can be used for general testing that has lots of complicated aspects and data?

For example, it has a bunch of different rk/ck variations, column data types, altered /added columns and data (which can impact sstables and compaction), 

Mischeivous data to prepopulate (such as https://github.com/minimaxir/big-list-of-naughty-strings for strings, ugly keys in maps, semi-evil column names) of sufficient size to get on most nodes of a 3-5 node cluster

superwide rows
large key values

version specific stuff to 2.1, 2.2, 3.x, 4.x

I'd be happy to centralize this in a github if this doesn't exist anywhere yet