cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-8503) Collect important stress profiles for regression analysis done by jenkins
Date Wed, 17 Dec 2014 18:17:14 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250242#comment-14250242
] 

Ariel Weisberg edited comment on CASSANDRA-8503 at 12/17/14 6:17 PM:
---------------------------------------------------------------------

I think there are two general classes of benchmarks you would run in CI. Representative user
workloads, and targeted microbenchmark workloads. Targeted workloads are a huge help during
ongoing development because they magnify the impact of regressions from code changes that
are harder to notice in representative workloads. They also point to the specific subsystem
being benchmarked.

I will just cover the microbenchmarks. The full matrix is large so there is an element of
wanting ponies, but the reality is that they are all interesting from a preventing performance
regressions and understanding the impact of ongoing changes perspective.

Benchmark the stress client, so excess server capacity and a single client testing lots of
small messages. Lots of large messages. Stuff the servers can answer as fast as possible.
The flip side of this workload is the same thing but for the server where you measure how
many trivially answerable tiny queries you can shove through a cluster given excess client
capacity. When testing the server this might also be when you test the matrix of replication
and consistency levels.

Benchmark perfomance of non-prepared statements.

Benchmark performance of preparing statements?
 
A full test matrix for data intensive workloads would test read, write, and 50/50, and for
a bonus 90/10. Single cell partitions with a small value and a large value, and a range of
wide rows (small, medium, large). All 3 compaction strategies with compression on/off. Data
intensive workloads also need to run against a spinning rust and SSDs.

CQL specific microbenchmarks against specific CQL datatypes. If there are interactions that
are important we should capture those.

Counters

Lightweight transactions

The matrix also needs to include different permutations of replication strategies and consistency
levels. Maybe we can constrain those variations to parts of the matrix that would best reflect
the impact of replication strategies and CL. Probably a subset of the data intensive workloads.

Also want a workload targeting the row cache and key cache when everything is cached and when
there is a realistic long tail of data not in the cache.

For every workload to really get the value you would like a graph for throughput and a graph
for latency at some percentile with a data point per revision tested going back to the beginning
as well as a 90 day graph. A trend line also helps. Then someone has to be it for monitoring
the graphs and poking people when there is an issue.

The workflow usually goes something like the monitor tags the author of the suspected bad
revision who triages it and either fixes it or hands it off to the correct person. Timeliness
is really important because once regressions start stacking it's a pain to know whether you
have done what you should to fix it.


was (Author: aweisberg):
I think there are two general classes of benchmarks you would run in CI. Representative user
workloads, and targeted microbenchmark workloads. Targeted workloads are a huge help during
ongoing development because they magnify the impact of regressions from code changes that
are harder to notice in representative workloads. They also point to the specific subsystem
being benchmarked.

I will just cover the microbenchmarks. The full matrix is large so there is an element of
wanting ponies, but the reality is that they are all interesting from a preventing performance
regressions and understanding the impact of ongoing changes perspective.

Benchmark the stress client, so excess server capacity and a single client testing lots of
small messages. Lots of large messages. Stuff the servers can answer as fast as possible.
The flip side of this workload is the same thing but for the server where you measure how
many trivially answerable tiny queries you can shove through a cluster given excess client
capacity.

Benchmark perfomance of non-prepared statements.

Benchmark performance of preparing statements?
 
A full test matrix for data intensive workloads would test read, write, and 50/50, and for
a bonus 90/10. Single cell partitions with a small value and a large value, and a range of
wide rows (small, medium, large). All 3 compaction strategies with compression on/off. Data
intensive workloads also need to run against a spinning rust and SSDs.

CQL specific microbenchmarks against specific CQL datatypes. If there are interactions that
are important we should capture those.

Counters

Lightweight transactions

The matrix also needs to include different permutations of replication strategies and consistency
levels. Maybe we can constrain those variations to parts of the matrix that would best reflect
the impact of replication strategies and CL. Probably a subset of the data intensive workloads.

Also want a workload targeting the row cache and key cache when everything is cached and when
there is a realistic long tail of data not in the cache.

For every workload to really get the value you would like a graph for throughput and a graph
for latency at some percentile with a data point per revision tested going back to the beginning
as well as a 90 day graph. A trend line also helps. Then someone has to be it for monitoring
the graphs and poking people when there is an issue.

The workflow usually goes something like the monitor tags the author of the suspected bad
revision who triages it and either fixes it or hands it off to the correct person. Timeliness
is really important because once regressions start stacking it's a pain to know whether you
have done what you should to fix it.

> Collect important stress profiles for regression analysis done by jenkins
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8503
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8503
>             Project: Cassandra
>          Issue Type: Task
>            Reporter: Ryan McGuire
>            Assignee: Ryan McGuire
>
> We have a weekly job setup on CassCI to run a performance benchmark against the dev branches
as well as the last stable releases.
> Here's an example:
> http://cstar.datastax.com/tests/id/8223fe2e-8585-11e4-b0bf-42010af0688f
> This test is currently pretty basic, it's running on three nodes, with a the default
stress profile. We should crowdsource a collection of stress profiles to run, and then once
we have many of these tests running we can collect them all into a weekly email.
> Ideas:
>  * Timeseries (Can this be done with stress? not sure)
>  * compact storage
>  * compression off
>  * ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message