cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Stupp (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8929) Workload sampling
Date Fri, 06 Mar 2015 21:26:39 GMT


Robert Stupp commented on CASSANDRA-8929:

That's what I meant: a tool that operates on the recorded statements. Recording the CQL statements
(along with some state like pstmts) doesn't feel to be super-complicated or super-intrusive
in the code path. It clearly adds some overhead on CPU and I/O if turned on - also some contention
(multiple connections against a single trace). In the worst case it could slow down the node
if we don't handle that situation (e.g. drop some trace information if trace disk's too slow).

Technically that recording would be a trace of everything "on the wire" enriched by some additional
information like a dump of all prepared statements at beginning of the trace.
We could get a lot of information from such a trace. Not just every native protocol operation
but also network related information like number of established or closed connections or whether
a connection uses SSL, is authenticated and so on.

Regarding the goal: I don't just only see upgrade-acceptance-tests as a goal. Also a possibility
to analyze operations that happen during some time frame - as part of "bug fixing" or regular
QA. Also useful to compare workloads of client application versions. To be clear: IMO it's
not meant to be some kind of "security audit".

A minimalistic playback tool would just issue N percent (1..100) of all contained DML statements
and maybe simulate multiple connections. Everything on top of that would be out of scope (for

> Workload sampling
> -----------------
>                 Key: CASSANDRA-8929
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Tools
>            Reporter: Jonathan Ellis
> Workload *recording* looks to be unworkable (CASSANDRA-6572).  We could build something
almost as useful by sampling the requests sent to a node and building a synthetic workload
with the same characteristics using the same (or anonymized) schema.

This message was sent by Atlassian JIRA

View raw message