cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergio Bossa (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-9238) Race condition after shutdown gossip message
Date Tue, 28 Apr 2015 14:31:06 GMT


Sergio Bossa updated CASSANDRA-9238:
    Attachment: 2.0-CASSANDRA-9238-v2.txt

[~brandon.williams], I've attached a v2 patch addressing the problem in a different way: that
is, by properly closing all established connections when the {{MessagingService}} shutdown,
so that the sender node will end up creating new connections once the shutdown node starts
listening again.

This seems to fix my original problem: I've verified by commenting out the previous (now committed)
patch, so you might want to eventually revert it. Also, this _might_ fix CASSANDRA-8072 as
well (I verified via netstat all connections are actually closed), but I only glanced through
your last comments there, so I might be wrong.

> Race condition after shutdown gossip message
> --------------------------------------------
>                 Key: CASSANDRA-9238
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sergio Bossa
>            Assignee: Sergio Bossa
>            Priority: Minor
>             Fix For: 2.0.15, 2.1.6
>         Attachments: 2.0-CASSANDRA-9238-v2.txt, 2.0-CASSANDRA-9238.txt
> CASSANDRA-8336 introduced a race condition causing gossip messages to be sent to shutdown
nodes even if they have been already marked dead.
> That's because CASSANDRA-8336 changed (among other things) the way the SHUTDOWN gossip
message is sent by moving it before the gossip task (the one sending SYN messages), and by
putting a few secs wait between the two; this opens a race window by the receiving side between
the time the SHUTDOWN message is received, causing the outbound sockets to be closed, and
the moment the other side listening socket is actually closed, meaning that any SYN gossip
message exchanged in such window will reopen the sockets and never close them again, as the
node is already marked dead. 

This message was sent by Atlassian JIRA

View raw message