Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A01A411D86 for ; Tue, 26 Aug 2014 21:37:58 +0000 (UTC) Received: (qmail 40719 invoked by uid 500); 26 Aug 2014 21:37:58 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 40677 invoked by uid 500); 26 Aug 2014 21:37:58 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 40513 invoked by uid 99); 26 Aug 2014 21:37:58 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Aug 2014 21:37:58 +0000 Date: Tue, 26 Aug 2014 21:37:58 +0000 (UTC) From: "nayden kolev (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-7825) node decommission leaves ghost nodes in system.peers table and JMX MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-7825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nayden kolev updated CASSANDRA-7825: ------------------------------------ Priority: Minor (was: Major) > node decommission leaves ghost nodes in system.peers table and JMX > ------------------------------------------------------------------ > > Key: CASSANDRA-7825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7825 > Project: Cassandra > Issue Type: Bug > Environment: OS: Ubuntu 12.04.4 LTS > Cassandra: ReleaseVersion: 2.0.8.39 > DSE 4.5.1 > OpsCenter: 5.0.0 > Reporter: nayden kolev > Priority: Minor > > I have a 4-node cluster (split in 2 DCs) running DSE 4.5.1, C* 2.0.8.39. I needed to cycle a node (add a new node and remove one). I followed this doc (more specifically steps 1 and 2): > http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_remove_node_t.html > After the decom, the decommissioned node logged this: > INFO [RMI TCP Connection(17)-10.1.129.27] 2014-08-23 09:57:08,243 ThriftServer.java (line 141) Stop listening to thrift clients > INFO [RMI TCP Connection(17)-10.1.129.27] 2014-08-23 09:57:08,269 Server.java (line 182) Stop listening for CQL clients > INFO [RMI TCP Connection(17)-10.1.129.27] 2014-08-23 09:57:08,270 Gossiper.java (line 1279) Announcing shutdown > INFO [RMI TCP Connection(17)-10.1.129.27] 2014-08-23 09:57:10,271 MessagingService.java (line 683) Waiting for messaging service to quiesce > INFO [ACCEPT-/10.1.129.27] 2014-08-23 09:57:10,272 MessagingService.java (line 923) MessagingService has terminated the accept() thread > INFO [RMI TCP Connection(17)-10.1.129.27] 2014-08-23 09:57:10,280 StorageService.java (line 1007) DECOMMISSIONED > The decommissioned node no longer appears in OpsCenter, and 'nodetool status' shows it gone from the cluster as well, with the remaining 4 nodes un UN state. > All is good... Then I noticed that the DownEndpointCount (still) shows as 1 - using a JMX console, and browsing to org.apache.cassandra.net, FailureDetector, Attributes, DownEdpointCount. While there, I also noticed that SimpleStates shows the decommissioned node as down, and the AllEndpointStates shows it as STATUS:LEFT > I tried running a 'nodetool removenode decom-node's-host-id', but it failed with "Host ID not found", which I expected, given I decommissioned it and it does not show in nodetool status. > nodetool describecluster lists only the expected 4 nodes (does not show the decommissioned node) > checking the system.peers table lists the decomm-ed node with a null host_id, rack, release_version, rpc_address, schema_version, etc. > Adding JVM_OPTS="$JVM_OPTS -Dcassandra.load_ring_state=false" to the Cassandra-env.sh as suggested here: > https://issues.apache.org/jira/browse/CASSANDRA-6053 > does not help. I have actually tried this before, when I was decommissioning a node on an older C* version and it worked, but now it does nothing. If I delete the row mentioning the decommissioned node from the system.peers table it stays out of there until the next dse service restart. > This is causing apps to timeout, since they get a invalid node's IP... As a workaround I remove the entry from the peers table, but it is not permanent... -- This message was sent by Atlassian JIRA (v6.2#6252)