Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Wed, 15 Feb 2012 07:56:00 +0000 (UTC)
From: "Peter Schuller (Assigned) (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: 
 <1361722863.39507.1329292560217.JavaMail.tomcat@hel.zones.apache.org>
In-Reply-To: 
 <1016134593.304.1328139833590.JavaMail.tomcat@hel.zones.apache.org>
Subject: [jira] [Assigned] (CASSANDRA-3830) gossip-to-seeds is not obviously
 independent of failure detection algorithm
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/CASSANDRA-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Schuller reassigned CASSANDRA-3830:
-----------------------------------------

    Assignee: Peter Schuller
    
> gossip-to-seeds is not obviously independent of failure detection algorithm 
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3830
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3830
>             Project: Cassandra
>          Issue Type: Task
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>
> The failure detector, ignoring all the theory, boils down to an
> extremely simple algorithm. The FD keeps track of a sliding window (of
> 1000 currently) intervals of heartbeat for a given host. Meaning, we
> have a track record of the last 1000 times we saw an updated heartbeat
> for a host.
> At any given moment, a host has a score which is simply the time since
> the last heartbeat, over the *mean* interval in the sliding
> window. For historical reasons a simple scaling factor is applied to
> this prior to checking the phi conviction threshold.
> (CASSANDRA-2597 has details, but thanks to Paul's work there it's now
> trivial to understand what it does based on gut feeling)
> So in effect, a host is considered down if we haven't heard from it in
> some time which is significantly longer than the "average" time we
> expect to hear from it.
> This seems reasonable, but it does assume that under normal conditions
> the average time between heartbeats does not change for reasons other
> than those that would be plausible reasons to think a node is
> unhealthy.
> This assumption *could* be violated by the gossip-to-seed
> feature. There is an argument to avoid gossip-to-seed for other
> reasons (see CASSANDRA-3829), but this is a concrete case in which the
> gossip-to-seed could cause a negative side-effect of the general kind
> mentioned in CASSANDRA-3829 (see notes at end about not case w/o seeds
> not being continuously tested). Normally, due to gossip to seed,
> everyone essentially sees latest information within very few hart
> beats (assuming only 2-3 seeds). But should all seeds be down,
> suddenly we flip a switch and start relying on generalized propagation
> in the gossip system, rather than the seed special case.
> The potential problem I forese here is that if the average propagation
> time suddenly spikes when all seeds become available, it could cause
> bogus flapping of nodes into down state.
> In order to test this, I deployeda ~ 180 node cluster with a version
> that logs heartbet information on each interpret(), similar to:
>  INFO [GossipTasks:1] 2012-02-01 23:29:58,746 FailureDetector.java (line 187) ep /XXX.XXX.XXX.XXX is at phi 0.0019521638443084342, last interval 7.0, mean is 1557.2777777777778
> It turns out that, at least at 180 nodes, with 4 seed nodes, whether
> or not seeds are running *does not* seem to matter significantly. In
> both cases, the mean interval is around 1500 milliseconds.
> I don't feel I have a good grasp of whether this is incidental or
> guaranteed, and it would be good to at least empirically test
> propagation time w/o seeds at differnet cluster sizes; it's supposed
> to be un-affected by cluster size ({{RING_DELAY}} is static for this
> reason, is my understanding). Would be nice to see this be the case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira