cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li Zou (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5932) Speculative read performance data show unexpected results
Date Wed, 02 Oct 2013 15:48:43 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784089#comment-13784089
] 

Li Zou commented on CASSANDRA-5932:
-----------------------------------

This testing result is reasonable and what is expected.

For PERCENTILE / CUSTOM configuration, the larger the {{cfs.sampleLatencyNanos}} the smaller
the throughput impact for normal operations before the outage. However, during the outage
period, the situation is reversed, i.e. the smaller {{cfs.sampleLatencyNanos}}, the smaller
the throughput impact will be, as it times out quicker and triggers the speculative retries.

For the ALWAYS configuration, as it always sends out one speculative in addition to the usual
read requests, the throughput performance should be lower than those of PERCENTILE / CUSTOM
for normal operations before the outage. Since it always sends out the speculative retries,
the throughput impact during the outage period should be the smallest. The testing result
indicates that this is true.


> Speculative read performance data show unexpected results
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-5932
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5932
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Ryan McGuire
>            Assignee: Aleksey Yeschenko
>             Fix For: 2.0.2
>
>         Attachments: 5932.6692c50412ef7d.compaction.png, 5932-6692c50412ef7d.png, 5932.6692c50412ef7d.rr0.png,
5932.ded39c7e1c2fa.logs.tar.gz, 5932.txt, 5933-128_and_200rc1.png, 5933-7a87fc11.png, 5933-logs.tar.gz,
5933-randomized-dsnitch-replica.2.png, 5933-randomized-dsnitch-replica.3.png, 5933-randomized-dsnitch-replica.png,
compaction-makes-slow.png, compaction-makes-slow-stats.png, eager-read-looks-promising.png,
eager-read-looks-promising-stats.png, eager-read-not-consistent.png, eager-read-not-consistent-stats.png,
node-down-increase-performance.png
>
>
> I've done a series of stress tests with eager retries enabled that show undesirable behavior.
I'm grouping these behaviours into one ticket as they are most likely related.
> 1) Killing off a node in a 4 node cluster actually increases performance.
> 2) Compactions make nodes slow, even after the compaction is done.
> 3) Eager Reads tend to lessen the *immediate* performance impact of a node going down,
but not consistently.
> My Environment:
> 1 stress machine: node0
> 4 C* nodes: node4, node5, node6, node7
> My script:
> node0 writes some data: stress -d node4 -F 30000000 -n 30000000 -i 5 -l 2 -K 20
> node0 reads some data: stress -d node4 -n 30000000 -o read -i 5 -K 20
> h3. Examples:
> h5. A node going down increases performance:
> !node-down-increase-performance.png!
> [Data for this test here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.eager_retry.node_killed.just_20.json&metric=interval_op_rate&operation=stress-read&smoothing=1]
> At 450s, I kill -9 one of the nodes. There is a brief decrease in performance as the
snitch adapts, but then it recovers... to even higher performance than before.
> h5. Compactions make nodes permanently slow:
> !compaction-makes-slow.png!
> !compaction-makes-slow-stats.png!
> The green and orange lines represent trials with eager retry enabled, they never recover
their op-rate from before the compaction as the red and blue lines do.
> [Data for this test here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.eager_retry.compaction.2.json&metric=interval_op_rate&operation=stress-read&smoothing=1]
> h5. Speculative Read tends to lessen the *immediate* impact:
> !eager-read-looks-promising.png!
> !eager-read-looks-promising-stats.png!
> This graph looked the most promising to me, the two trials with eager retry, the green
and orange line, at 450s showed the smallest dip in performance. 
> [Data for this test here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.eager_retry.node_killed.json&metric=interval_op_rate&operation=stress-read&smoothing=1]
> h5. But not always:
> !eager-read-not-consistent.png!
> !eager-read-not-consistent-stats.png!
> This is a retrial with the same settings as above, yet the 95percentile eager retry (red
line) did poorly this time at 450s.
> [Data for this test here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.eager_retry.node_killed.just_20.rc1.try2.json&metric=interval_op_rate&operation=stress-read&smoothing=1]



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message