cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-3273) FailureDetector can take a very long time to mark a host down
Date Thu, 29 Sep 2011 04:22:45 GMT
FailureDetector can take a very long time to mark a host down
-------------------------------------------------------------

                 Key: CASSANDRA-3273
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3273
             Project: Cassandra
          Issue Type: Bug
          Components: API
            Reporter: Brandon Williams
            Assignee: Brandon Williams


There are two ways to trigger this:

* Bring a node up very briefly in a mixed-version cluster and then terminate it
* Bring a node up, terminate it for a very long time, then bring it back up and take it down
again

In the first case, what can happen is a very short interval arrival time is recorded by the
versioning logic which requires reconnecting and can happen very quickly. This can easily
be solved by rejecting any intervals within a reasonable bound, for instance the gossiper
interval.

The second instance is harder to solve, because what is happening is that an extremely large
interval is recorded, which the time the node was left dead the first time.  This throws off
the mean of the intervals and causes it to take a much longer time to mark it down the second
time.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message