Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3C4DC17AC0 for ; Thu, 2 Apr 2015 13:30:56 +0000 (UTC) Received: (qmail 48725 invoked by uid 500); 2 Apr 2015 13:30:46 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 48684 invoked by uid 500); 2 Apr 2015 13:30:46 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 48643 invoked by uid 99); 2 Apr 2015 13:30:46 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Apr 2015 13:30:46 +0000 Date: Thu, 2 Apr 2015 13:30:46 +0000 (UTC) From: "Brandon Williams (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (CASSANDRA-8336) Quarantine nodes after receiving the gossip shutdown message MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14389207#comment-14389207 ] Brandon Williams edited comment on CASSANDRA-8336 at 4/2/15 1:12 PM: --------------------------------------------------------------------- After wrestling with exceptions for a bit, I came up with a simpler solution. Gossiper's stop() can examine the local state itself, and skip shutdown announcement if it doesn't exist. We still need stopSilently (which I renamed in this patch from stopForLeaving) for cases like decom, where we aren't coming back and don't want to mutate our state on shutdown. was (Author: brandon.williams): After wrestling with exceptions for a bit, I came up with a simpler solution. Gossiper's stop() can examine the local state itself, and skip shutdown announcement if it doesn't exist. We still need stopSilently (which I renamed in this patch from stopForLeaving) for cases like decom, where we aren't coming back and don't wait to mutate our state on shutdown. > Quarantine nodes after receiving the gossip shutdown message > ------------------------------------------------------------ > > Key: CASSANDRA-8336 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8336 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Brandon Williams > Assignee: Brandon Williams > Fix For: 2.0.15 > > Attachments: 8336-v2.txt, 8336-v3.txt, 8336-v4.txt, 8336.txt > > > In CASSANDRA-3936 we added a gossip shutdown announcement. The problem here is that this isn't sufficient; you can still get TOEs and have to wait on the FD to figure things out. This happens due to gossip propagation time and variance; if node X shuts down and sends the message to Y, but Z has a greater gossip version than Y for X and has not yet received the message, it can initiate gossip with Y and thus mark X alive again. I propose quarantining to solve this, however I feel it should be a -D parameter you have to specify, so as not to destroy current dev and test practices, since this will mean a node that shuts down will not be able to restart until the quarantine expires. -- This message was sent by Atlassian JIRA (v6.3.4#6332)