Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 658BA10B08 for ; Thu, 18 Jul 2013 12:28:49 +0000 (UTC) Received: (qmail 24788 invoked by uid 500); 18 Jul 2013 12:28:49 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 24768 invoked by uid 500); 18 Jul 2013 12:28:48 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 24751 invoked by uid 99); 18 Jul 2013 12:28:48 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Jul 2013 12:28:48 +0000 Date: Thu, 18 Jul 2013 12:28:48 +0000 (UTC) From: "Sylvain Lebresne (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-5769) Not all STATUS_CHANGE UP events reported via the native protocol MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-5769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-5769: ---------------------------------------- Attachment: 5769.txt We are currently calling the native protocol notification in SS.handleStateNormal(), but that's only called on major state changes, while in the this case there is no generation change. I wouldn't claim the gossiper code is always easy to follow but it seems that moving the notification code from SS.handleStateNormal() to SS.onAlive() ensures we'll always notify (without over-notifying) so attaching a patch to do that. I've checked it does fix the notification in the case above. > Not all STATUS_CHANGE UP events reported via the native protocol > ---------------------------------------------------------------- > > Key: CASSANDRA-5769 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5769 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.2.5, 1.2.6 > Environment: Uubuntu 12.04, x86, 64 bit > Reporter: Duncan Sands > Assignee: Sylvain Lebresne > Priority: Minor > Attachments: 5769.txt > > > Not all gossip UP events are pushed to native protocol users who have registered for them. This seems to be a native protocol issue because nodes themselves get the UP event (as seen in their logs). I can consistently reproduce this issue as follows: > 1) connect a client to a cluster node ("node1") using the native protocol, register for TOPOLOGY_CHANGE and STATUS_CHANGE events. (Probably you only need to register for STATUS_CHANGE to see this, however my client registers for both). > 2) on another node ("node2"), send SIGSTOP to the Cassandra process. > 3) after about 30 seconds the client gets pushed a STATUS_CHANGE DOWN event for the stopped node. > 4) on node2, send SIGCONT to the the Cassandra process. > 5) wait forever to get a STATUS_CHANGE UP event. This is failure: no event is ever received. > Observe that node1 does know that node2 is back up: in its system log I see for example > INFO [GossipStage:1] 2013-07-17 14:27:41,238 Gossiper.java (line 771) InetAddress /172.18.34.169 is now UP > shortly after sending SIGCONT to the stopped process. > To eliminate the possibility that my client is at fault, I performed the following sanity check: > 2') on node2, stopped Cassandra nicely using: sudo service cassandra stop > 4') on node2, restarted Cassandra using: sudo service cassandra start > In this case the client soon after gets a STATUS_CHANGE DOWN event followed by a STATUS_CHANGE UP event for node2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira