cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "T Jake Luciani (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11853) Improve Cassandra-Stress latency measurement
Date Tue, 24 May 2016 02:16:12 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297538#comment-15297538
] 

T Jake Luciani commented on CASSANDRA-11853:
--------------------------------------------

I'm re-running with multiple settings to see how it changes.

Looking at the code my main questions is the UniformRateLimiter.
If I understand the code correctly UniformRateLimiter linearly scales the ops/sec from the
time the limiter was constructed, so when an operation is ready to run it gets it's expected
start time based on the absolute operation number it is.

I see two problems with this:
   * The rate limiter is created at startup and doesn't account for warmup/hotspot etc.  So
once warmed up the ops are behind.  This explains the [initial latency spike|http://cstar.datastax.com/graph?command=one_job&stats=022678d8-2123-11e6-bcd7-0256e416528f&metric=99th_latency&operation=3_read&smoothing=1&show_aggregates=true&xmin=0&xmax=549.67&ymin=0&ymax=318.01]
in the run which skew the overall results.   The limiter start time should only be set once
the actual measured ops are ready to start.
   * If the rate limit is set too high, such that stress can't keep up with the expected rate,
the results will make no sense. The actual start time will be way after the limiters calculated
start time.
   
It would be very good if we could add some way of detecting local GC pauses like we do in
[GCInspector|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/GCInspector.java]
otherwise users have no way of knowing if the latency is due to local pauses or server pauses.

General comments/nits on the branch:
   * The [code style|https://wiki.apache.org/cassandra/CodeStyle] needs to be fixed (break
on bracket etc)
   * HdrHistogram needs to also be added to the build.xml maven/pom dependencies
   * Comments on top level classes like UniformRateLimiter would be helpful for future readers.
   


> Improve Cassandra-Stress latency measurement
> --------------------------------------------
>
>                 Key: CASSANDRA-11853
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11853
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Nitsan Wakart
>            Assignee: Nitsan Wakart
>             Fix For: 3.x
>
>
> Currently CS reports latency using a sampling latency container and reporting service
time (as opposed to response time from intended schedule) leading to coordinated omission.
> Fixed here:
> https://github.com/nitsanw/cassandra/tree/co-correction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message