cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8929) Workload sampling
Date Fri, 06 Mar 2015 20:39:39 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350856#comment-14350856
] 

Benedict commented on CASSANDRA-8929:
-------------------------------------

So the goal is for users to do this as an acceptance phase prior to deploying an upgrade?

We can certainly work to make it easier to produce a good profile (manually or otherwise),
and I think better example profiles that we use for testing will go a long way towards this.


I do like the _idea_ of automatic generation, but it's not a simple task, and it will touch
quite a few integral codepaths. We need at minimum, for each update, to sample presence, size
and compressibility for each column, along with a frequency distribution of partition key
participation, and cql row participation (i.e. for each partition key, we need to reconstruct
the distribution of updates for each row within it). Simply collecting this is non-trivial.
Constructing a profile from this data - once stress supports all of the functionality encountered
- probably isn't super challenging conceptually, as we can calculate a best-fit distribution
for the data we've sampled. It's still a significant chunk of work though. I do wonder if
we can't instead create a tool for generating this from an analysis of sstables combined with
some user provided data, as it would be easier to build and maintain without it being intertwined
with the c* code. Possibly alongside some very simple sampling of just the frequency of given
CQL statements.

> Workload sampling
> -----------------
>
>                 Key: CASSANDRA-8929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Tools
>            Reporter: Jonathan Ellis
>
> Workload *recording* looks to be unworkable (CASSANDRA-6572).  We could build something
almost as useful by sampling the requests sent to a node and building a synthetic workload
with the same characteristics using the same (or anonymized) schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message