Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C27E9107EC for ; Tue, 28 Apr 2015 14:31:11 +0000 (UTC) Received: (qmail 82911 invoked by uid 500); 28 Apr 2015 14:31:06 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 82871 invoked by uid 500); 28 Apr 2015 14:31:06 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 82859 invoked by uid 99); 28 Apr 2015 14:31:06 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Apr 2015 14:31:06 +0000 Date: Tue, 28 Apr 2015 14:31:06 +0000 (UTC) From: "Sergio Bossa (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-9238) Race condition after shutdown gossip message MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-9238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Bossa updated CASSANDRA-9238: ------------------------------------ Attachment: 2.0-CASSANDRA-9238-v2.txt [~brandon.williams], I've attached a v2 patch addressing the problem in a different way: that is, by properly closing all established connections when the {{MessagingService}} shutdown, so that the sender node will end up creating new connections once the shutdown node starts listening again. This seems to fix my original problem: I've verified by commenting out the previous (now committed) patch, so you might want to eventually revert it. Also, this _might_ fix CASSANDRA-8072 as well (I verified via netstat all connections are actually closed), but I only glanced through your last comments there, so I might be wrong. > Race condition after shutdown gossip message > -------------------------------------------- > > Key: CASSANDRA-9238 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9238 > Project: Cassandra > Issue Type: Bug > Reporter: Sergio Bossa > Assignee: Sergio Bossa > Priority: Minor > Fix For: 2.0.15, 2.1.6 > > Attachments: 2.0-CASSANDRA-9238-v2.txt, 2.0-CASSANDRA-9238.txt > > > CASSANDRA-8336 introduced a race condition causing gossip messages to be sent to shutdown nodes even if they have been already marked dead. > That's because CASSANDRA-8336 changed (among other things) the way the SHUTDOWN gossip message is sent by moving it before the gossip task (the one sending SYN messages), and by putting a few secs wait between the two; this opens a race window by the receiving side between the time the SHUTDOWN message is received, causing the outbound sockets to be closed, and the moment the other side listening socket is actually closed, meaning that any SYN gossip message exchanged in such window will reopen the sockets and never close them again, as the node is already marked dead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)