cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russell Alexander Spitzer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7821) Add Optional Backoff on Retry to Cassandra Stress
Date Mon, 25 Aug 2014 19:44:59 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109626#comment-14109626
] 

Russell Alexander Spitzer commented on CASSANDRA-7821:
------------------------------------------------------

Attached a patch which adds to simlpe options to C* Stress
{code}
backoff_strategy = {CONSTANT,LINEAR,EXPONENTIAL} 
     CONSTANT : A constant amount of seconds based on backoff_seconds
     LINEAR : An amount of time based on the retry_num * backoff_seconds
     EXPONENTIAL: An amount of time based on backoff_seconds * 2 ^ retry_num

backoff_seconds = #
     The number of seconds to be used as a coefficent in the above strategies
{code}
https://github.com/RussellSpitzer/cassandra/compare/RussellSpitzer:cassandra-2.1...CASSANDRA-7821

I also bumped up the timeout for threads up to 10 minutes but ideally we would pass through
the max expected amount of retry time.
[~benedict] As usual your feedback would be extremely welcome

> Add Optional Backoff on Retry to Cassandra Stress
> -------------------------------------------------
>
>                 Key: CASSANDRA-7821
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7821
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Russell Alexander Spitzer
>            Assignee: Russell Alexander Spitzer
>
> Currently when stress is running against a cluster which occasionally has nodes marked
as down, it will almost immediately stop. This occurs because the retry loop can execute extremely
quickly if each execution terminates with a {{com.datastax.driver.core.exceptions.NoHostAvailableException}}
or {{com.datastax.driver.core.exceptions.UnavailableException}}. 
> In case of these  exceptions is will most likely be unable to succeed if the retries
are performed as fast as possible. To get around this, we could add an optional delay on retries
giving the cluster time to recover rather than terminating the stress run. 
> We could make this configurable, with options such as:
> * Constant # Delays the same amount after each retry
> * Linear # Backoff a set amount * the trial number
> * Exponential # Backoff set amount * 2 ^ trial number
> This may also require adjusting the "thread is stuck check" to make sure that the max
retry timeout will not cause the thread to be terminated early.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message