cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li Zou (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5932) Speculative read performance data show unexpected results
Date Fri, 04 Oct 2013 20:35:47 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786585#comment-13786585
] 

Li Zou commented on CASSANDRA-5932:
-----------------------------------

Have done some testing using today's trunk. Have observed following issues.

*Issue 1* -- The first method {{MessagingService.addCallback()}} (i.e. without the ConsistencyLevel
argument) asserts.

Commenting out the assert statement seems to work. But the Cassandra servers themselves will
produce 10-second outage (i.e. zero transactions from the client point of view) periodically.

*Issue 2* -- The Speculative Retry seems stop retrying during the outage window.

During the outage window triggered either by killing one of Cassandra nodes or produced by
Cassandra servers themselves, the JConsole shows that the JMX stats, SpeculativeRetry counter
stops incrementing until the gossip figures out the outage issue.

What is the reason for this? The Speculative Retry is meant to help during the outage period.
This observed behavior is consistent with Cassandra 2.0.0-rc2.


> Speculative read performance data show unexpected results
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-5932
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5932
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Ryan McGuire
>            Assignee: Aleksey Yeschenko
>             Fix For: 2.0.2
>
>         Attachments: 5932.6692c50412ef7d.compaction.png, 5932-6692c50412ef7d.png, 5932.6692c50412ef7d.rr0.png,
5932.6692c50412ef7d.rr1.png, 5932.ded39c7e1c2fa.logs.tar.gz, 5932.txt, 5933-128_and_200rc1.png,
5933-7a87fc11.png, 5933-logs.tar.gz, 5933-randomized-dsnitch-replica.2.png, 5933-randomized-dsnitch-replica.3.png,
5933-randomized-dsnitch-replica.png, compaction-makes-slow.png, compaction-makes-slow-stats.png,
eager-read-looks-promising.png, eager-read-looks-promising-stats.png, eager-read-not-consistent.png,
eager-read-not-consistent-stats.png, node-down-increase-performance.png
>
>
> I've done a series of stress tests with eager retries enabled that show undesirable behavior.
I'm grouping these behaviours into one ticket as they are most likely related.
> 1) Killing off a node in a 4 node cluster actually increases performance.
> 2) Compactions make nodes slow, even after the compaction is done.
> 3) Eager Reads tend to lessen the *immediate* performance impact of a node going down,
but not consistently.
> My Environment:
> 1 stress machine: node0
> 4 C* nodes: node4, node5, node6, node7
> My script:
> node0 writes some data: stress -d node4 -F 30000000 -n 30000000 -i 5 -l 2 -K 20
> node0 reads some data: stress -d node4 -n 30000000 -o read -i 5 -K 20
> h3. Examples:
> h5. A node going down increases performance:
> !node-down-increase-performance.png!
> [Data for this test here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.eager_retry.node_killed.just_20.json&metric=interval_op_rate&operation=stress-read&smoothing=1]
> At 450s, I kill -9 one of the nodes. There is a brief decrease in performance as the
snitch adapts, but then it recovers... to even higher performance than before.
> h5. Compactions make nodes permanently slow:
> !compaction-makes-slow.png!
> !compaction-makes-slow-stats.png!
> The green and orange lines represent trials with eager retry enabled, they never recover
their op-rate from before the compaction as the red and blue lines do.
> [Data for this test here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.eager_retry.compaction.2.json&metric=interval_op_rate&operation=stress-read&smoothing=1]
> h5. Speculative Read tends to lessen the *immediate* impact:
> !eager-read-looks-promising.png!
> !eager-read-looks-promising-stats.png!
> This graph looked the most promising to me, the two trials with eager retry, the green
and orange line, at 450s showed the smallest dip in performance. 
> [Data for this test here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.eager_retry.node_killed.json&metric=interval_op_rate&operation=stress-read&smoothing=1]
> h5. But not always:
> !eager-read-not-consistent.png!
> !eager-read-not-consistent-stats.png!
> This is a retrial with the same settings as above, yet the 95percentile eager retry (red
line) did poorly this time at 450s.
> [Data for this test here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.eager_retry.node_killed.just_20.rc1.try2.json&metric=interval_op_rate&operation=stress-read&smoothing=1]



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message