Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 323D599E5 for ; Wed, 15 Feb 2012 07:56:24 +0000 (UTC) Received: (qmail 24170 invoked by uid 500); 15 Feb 2012 07:56:24 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 24142 invoked by uid 500); 15 Feb 2012 07:56:24 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 24133 invoked by uid 99); 15 Feb 2012 07:56:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Feb 2012 07:56:24 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Feb 2012 07:56:21 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 34BAB1B81FB for ; Wed, 15 Feb 2012 07:56:00 +0000 (UTC) Date: Wed, 15 Feb 2012 07:56:00 +0000 (UTC) From: "Peter Schuller (Assigned) (JIRA)" To: commits@cassandra.apache.org Message-ID: <1361722863.39507.1329292560217.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1016134593.304.1328139833590.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Assigned] (CASSANDRA-3830) gossip-to-seeds is not obviously independent of failure detection algorithm MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Schuller reassigned CASSANDRA-3830: ----------------------------------------- Assignee: Peter Schuller > gossip-to-seeds is not obviously independent of failure detection algorithm > ---------------------------------------------------------------------------- > > Key: CASSANDRA-3830 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3830 > Project: Cassandra > Issue Type: Task > Components: Core > Reporter: Peter Schuller > Assignee: Peter Schuller > Priority: Minor > > The failure detector, ignoring all the theory, boils down to an > extremely simple algorithm. The FD keeps track of a sliding window (of > 1000 currently) intervals of heartbeat for a given host. Meaning, we > have a track record of the last 1000 times we saw an updated heartbeat > for a host. > At any given moment, a host has a score which is simply the time since > the last heartbeat, over the *mean* interval in the sliding > window. For historical reasons a simple scaling factor is applied to > this prior to checking the phi conviction threshold. > (CASSANDRA-2597 has details, but thanks to Paul's work there it's now > trivial to understand what it does based on gut feeling) > So in effect, a host is considered down if we haven't heard from it in > some time which is significantly longer than the "average" time we > expect to hear from it. > This seems reasonable, but it does assume that under normal conditions > the average time between heartbeats does not change for reasons other > than those that would be plausible reasons to think a node is > unhealthy. > This assumption *could* be violated by the gossip-to-seed > feature. There is an argument to avoid gossip-to-seed for other > reasons (see CASSANDRA-3829), but this is a concrete case in which the > gossip-to-seed could cause a negative side-effect of the general kind > mentioned in CASSANDRA-3829 (see notes at end about not case w/o seeds > not being continuously tested). Normally, due to gossip to seed, > everyone essentially sees latest information within very few hart > beats (assuming only 2-3 seeds). But should all seeds be down, > suddenly we flip a switch and start relying on generalized propagation > in the gossip system, rather than the seed special case. > The potential problem I forese here is that if the average propagation > time suddenly spikes when all seeds become available, it could cause > bogus flapping of nodes into down state. > In order to test this, I deployeda ~ 180 node cluster with a version > that logs heartbet information on each interpret(), similar to: > INFO [GossipTasks:1] 2012-02-01 23:29:58,746 FailureDetector.java (line 187) ep /XXX.XXX.XXX.XXX is at phi 0.0019521638443084342, last interval 7.0, mean is 1557.2777777777778 > It turns out that, at least at 180 nodes, with 4 seed nodes, whether > or not seeds are running *does not* seem to matter significantly. In > both cases, the mean interval is around 1500 milliseconds. > I don't feel I have a good grasp of whether this is incidental or > guaranteed, and it would be good to at least empirically test > propagation time w/o seeds at differnet cluster sizes; it's supposed > to be un-affected by cluster size ({{RING_DELAY}} is static for this > reason, is my understanding). Would be nice to see this be the case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira