Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Tue, 28 Apr 2015 14:31:06 +0000 (UTC)
From: "Sergio Bossa (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12823853.1429902598000.7573.1430231466560@Atlassian.JIRA>
In-Reply-To: <JIRA.12823853.1429902598000@Atlassian.JIRA>
References: <JIRA.12823853.1429902598000@Atlassian.JIRA>
 <JIRA.12823853.1429902598386@arcas>
Subject: [jira] [Updated] (CASSANDRA-9238) Race condition after shutdown
 gossip message
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/CASSANDRA-9238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sergio Bossa updated CASSANDRA-9238:
------------------------------------
    Attachment: 2.0-CASSANDRA-9238-v2.txt

[~brandon.williams], I've attached a v2 patch addressing the problem in a different way: that is, by properly closing all established connections when the {{MessagingService}} shutdown, so that the sender node will end up creating new connections once the shutdown node starts listening again.

This seems to fix my original problem: I've verified by commenting out the previous (now committed) patch, so you might want to eventually revert it. Also, this _might_ fix CASSANDRA-8072 as well (I verified via netstat all connections are actually closed), but I only glanced through your last comments there, so I might be wrong.

> Race condition after shutdown gossip message
> --------------------------------------------
>
>                 Key: CASSANDRA-9238
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9238
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sergio Bossa
>            Assignee: Sergio Bossa
>            Priority: Minor
>             Fix For: 2.0.15, 2.1.6
>
>         Attachments: 2.0-CASSANDRA-9238-v2.txt, 2.0-CASSANDRA-9238.txt
>
>
> CASSANDRA-8336 introduced a race condition causing gossip messages to be sent to shutdown nodes even if they have been already marked dead.
> That's because CASSANDRA-8336 changed (among other things) the way the SHUTDOWN gossip message is sent by moving it before the gossip task (the one sending SYN messages), and by putting a few secs wait between the two; this opens a race window by the receiving side between the time the SHUTDOWN message is received, causing the outbound sockets to be closed, and the moment the other side listening socket is actually closed, meaning that any SYN gossip message exchanged in such window will reopen the sockets and never close them again, as the node is already marked dead. 


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)