cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Zhou (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-13261) Improve speculative retry to avoid being overloaded
Date Thu, 23 Feb 2017 23:33:44 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-13261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Simon Zhou updated CASSANDRA-13261:
-----------------------------------
    Attachment: CASSANDRA-13261-v1.patch

I'm not sure what's the next release for 3.0.* and 3.0.11 was just merged to trunk. The attached
patch is for trunk but I'd like to have this improvement included in the next release for
3.0.*.

[~tjake] Maybe you can help review this patch since you have some context from CASSANDRA-13009?

> Improve speculative retry to avoid being overloaded
> ---------------------------------------------------
>
>                 Key: CASSANDRA-13261
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13261
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Simon Zhou
>            Assignee: Simon Zhou
>         Attachments: CASSANDRA-13261-v1.patch
>
>
> In CASSANDRA-13009, I was suggested to separate the 2nd part of my patch as an improvement.
> This is to avoid Cassandra being overloaded when using CUSTOM speculative retry parameter.
Steps to reason/repro this with 3.0.10:
> 1. Use custom speculative retry threshold like this:
> cqlsh> alter TABLE to_repair1.users0 with speculative_retry='10ms';
> 2. SpeculatingReadExecutor will be used, according to this piece of code in AbstractReadExecutor:
> {code}
>         if (retry.equals(SpeculativeRetryParam.ALWAYS))
>             return new AlwaysSpeculatingReadExecutor(keyspace, cfs, command, consistencyLevel,
targetReplicas);
>         else // PERCENTILE or CUSTOM.
>             return new SpeculatingReadExecutor(keyspace, cfs, command, consistencyLevel,
targetReplicas);
> {code}
> 3. When RF=3 and LOCAL_QUORUM is used, the below code (from SpeculatingReadExecutor#maybeTryAdditionalReplicas)
won't be able to protect Cassandra from being overloaded, even though the inline comment suggests
such intention:
> {code}
>             // no latency information, or we're overloaded
>             if (cfs.sampleLatencyNanos > TimeUnit.MILLISECONDS.toNanos(command.getTimeout()))
>                 return;
> {code}
> The reason is that cfs.sampleLatencyNanos is assigned as 
> retryPolicy.threshold() which is 10ms in step #1 above, at line 405 of ColumnFamilyStore.
However pretty often the timeout is the default one 5000ms.
> As the name suggests, sampleLatencyNanos should be used to keep sampled latency, not
something configured "statically". My proposal:
> a. Introduce option -Dcassandra.overload.threshold to allow customizing overload threshold.
The default threshold would be DatabaseDescriptor.getRangeRpcTimeout().
> b. Assign sampled P99 latency to cfs.sampleLatencyNanos. For overload detection, we just
compare cfs.sampleLatencyNanos with the customizable threshold above.
> c. Use retryDelayNanos (instead of cfs.sampleLatencyNanos) for waiting time before retry
(see line 282 of AbstractReadExecutor). This is the value from table setting (PERCENTILE or
CUSTOM).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message