cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li Zou (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5932) Speculative read performance data show unexpected results
Date Tue, 24 Sep 2013 21:55:05 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776805#comment-13776805
] 

Li Zou commented on CASSANDRA-5932:
-----------------------------------

Hello [~iamaleksey],

Thanks for the link to this jira and for your very detailed testing results. It confirms what
we have seen in our lab testing for the Cassandra 2.0.0-rc2 "Speculative Execution for Reads".

We have a very simple data center setup consisting of four Cassandra nodes running on four
server machines. A testing application (Cassandra client) is interacting with Cassandra nodes
1, 2 and 3. That is, the testing app does not directly connected to the Cassandra node 4.

The keyspace Replication Factor is set to 3 and the client requested Consistency Level is
set to CL_TWO.

I have tested all of three configurations of the Speculative Execution for Reads ('ALWAYS',
'85 PERCENTILE', '50 MS' / '100 MS'). It seems that none of them works as expected. From the
test app log file point of view, they all give a 20-second window of outage immediately after
the 4th node was killed. This behavior is consistent to Cassandra 1.2.4.

I have done a quick code reading of the Cassandra Server implementation (Cassandra 2.0.0 tarball)
and I have noticed some design issues. I would like to discuss them with you.

*Issue 1* - StorageProxy.fetchRows() may still block for as long as conf.read_request_timeout_in_ms,
though the speculative retry did fire correctly after the Cassandra node 4 was killed.

Take the speculative configuration of 'PERCENTILE' / 'CUSTOM' as example, after the Cassandra
node 4 was killed, SpeculativeReadExecutor.speculate() would block for responses. If timed
out, it would send out one more read request to an alternative node (from {{unfiltered}})
and increment the speculativeRetry counter. This part should work.

However, killing the 4th node would very likely cause inconsistency in the database and this
will trigger the DigestMismatchException. In the fetchRows(), when handling DigestMismatchException,
it uses handler.endpoints to send out digest mismatch retries and then block for responses.
As we know that one of the endpoints was already killed, the handler.get() will block until
it is timed out, which is 10 seconds.


{noformat}
                catch (DigestMismatchException ex)
                {
                    Tracing.trace("Digest mismatch: {}", ex);

                    ...

                    MessageOut<ReadCommand> message = exec.command.createMessage();
                    for (InetAddress endpoint : exec.handler.endpoints)
                    {
                        Tracing.trace("Enqueuing full data read to {}", endpoint);
                        MessagingService.instance().sendRR(message, endpoint, repairHandler);
                    }
                }
            }

            ...

            // read the results for the digest mismatch retries
            if (repairResponseHandlers != null)
            {
                for (int i = 0; i < repairCommands.size(); i++)
                {
                    ReadCommand command = repairCommands.get(i);
                    ReadCallback<ReadResponse, Row> handler = repairResponseHandlers.get(i);

                    Row row;
                    try
                    {
                        row = handler.get();
                    }
{noformat}



*Issue 2* - The speculative 'ALWAYS' does NOT send out any more read requests. Thus, in face
of the failure of node 4, it will not help at all.

The SpeculateAlwaysExecutor.executeAsync() only sends out handler.endpoints.size() number
of read requests and it blocks for the responses to come back. If one of the nodes is killed,
say node 4, this speculative retry 'ALWAYS' will work the same way as Cassandra 1.2.4, i.e.
it will block until timed out, which is 10 seconds.

??My understanding of this speculative retry 'ALWAYS' should ALWAYS send out "handler.endpoints.size()
+ 1" number of read requests and block for handler.endpoints.size() number of responses??.

*Issue 3* - Since the ReadRepairDecison is determined by a Random() number, this speculative
retry may not work as the ReadRepairDecision may be ??ReadRepairDecision.GLOBAL??

*Issue 4* - For the ReadExecutor(s), the {{this.unfiltered}} and {{this.endpoints}} may not
consistent. Thus, using {{this.unfiltered}} and {{this.endpoints}} for speculative retry may
cause unexpected results. This is especially true when the Consistency Level is {{LOCAL_QUARUM}}
and the ReadRepairDecision is {{DC_LOCAL}}.




                
> Speculative read performance data show unexpected results
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-5932
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5932
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Ryan McGuire
>            Assignee: Aleksey Yeschenko
>             Fix For: 2.0.2
>
>         Attachments: compaction-makes-slow.png, compaction-makes-slow-stats.png, eager-read-looks-promising.png,
eager-read-looks-promising-stats.png, eager-read-not-consistent.png, eager-read-not-consistent-stats.png,
node-down-increase-performance.png
>
>
> I've done a series of stress tests with eager retries enabled that show undesirable behavior.
I'm grouping these behaviours into one ticket as they are most likely related.
> 1) Killing off a node in a 4 node cluster actually increases performance.
> 2) Compactions make nodes slow, even after the compaction is done.
> 3) Eager Reads tend to lessen the *immediate* performance impact of a node going down,
but not consistently.
> My Environment:
> 1 stress machine: node0
> 4 C* nodes: node4, node5, node6, node7
> My script:
> node0 writes some data: stress -d node4 -F 30000000 -n 30000000 -i 5 -l 2 -K 20
> node0 reads some data: stress -d node4 -n 30000000 -o read -i 5 -K 20
> h3. Examples:
> h5. A node going down increases performance:
> !node-down-increase-performance.png!
> [Data for this test here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.eager_retry.node_killed.just_20.json&metric=interval_op_rate&operation=stress-read&smoothing=1]
> At 450s, I kill -9 one of the nodes. There is a brief decrease in performance as the
snitch adapts, but then it recovers... to even higher performance than before.
> h5. Compactions make nodes permanently slow:
> !compaction-makes-slow.png!
> !compaction-makes-slow-stats.png!
> The green and orange lines represent trials with eager retry enabled, they never recover
their op-rate from before the compaction as the red and blue lines do.
> [Data for this test here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.eager_retry.compaction.2.json&metric=interval_op_rate&operation=stress-read&smoothing=1]
> h5. Speculative Read tends to lessen the *immediate* impact:
> !eager-read-looks-promising.png!
> !eager-read-looks-promising-stats.png!
> This graph looked the most promising to me, the two trials with eager retry, the green
and orange line, at 450s showed the smallest dip in performance. 
> [Data for this test here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.eager_retry.node_killed.json&metric=interval_op_rate&operation=stress-read&smoothing=1]
> h5. But not always:
> !eager-read-not-consistent.png!
> !eager-read-not-consistent-stats.png!
> This is a retrial with the same settings as above, yet the 95percentile eager retry (red
line) did poorly this time at 450s.
> [Data for this test here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.eager_retry.node_killed.just_20.rc1.try2.json&metric=interval_op_rate&operation=stress-read&smoothing=1]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message