Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AD49B11545 for ; Mon, 25 Aug 2014 17:12:58 +0000 (UTC) Received: (qmail 55091 invoked by uid 500); 25 Aug 2014 17:12:58 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 55053 invoked by uid 500); 25 Aug 2014 17:12:58 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 55041 invoked by uid 99); 25 Aug 2014 17:12:58 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Aug 2014 17:12:58 +0000 Date: Mon, 25 Aug 2014 17:12:58 +0000 (UTC) From: "nayden kolev (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (CASSANDRA-7825) node decommission leaves ghost nodes in system.peers table and JMX MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 nayden kolev created CASSANDRA-7825: --------------------------------------- Summary: node decommission leaves ghost nodes in system.peers table and JMX Key: CASSANDRA-7825 URL: https://issues.apache.org/jira/browse/CASSANDRA-7825 Project: Cassandra Issue Type: Bug Environment: OS: Ubuntu 12.04.4 LTS Cassandra: ReleaseVersion: 2.0.8.39 DSE 4.5.1 OpsCenter: 5.0.0 Reporter: nayden kolev I have a 4-node cluster (split in 2 DCs) running DSE 4.5.1, C* 2.0.8.39. I needed to cycle a node (add a new node and remove one). I followed this doc (more specifically steps 1 and 2): http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_remove_node_t.html After the decom, the decommissioned node logged this: INFO [RMI TCP Connection(17)-10.1.129.27] 2014-08-23 09:57:08,243 ThriftServer.java (line 141) Stop listening to thrift clients INFO [RMI TCP Connection(17)-10.1.129.27] 2014-08-23 09:57:08,269 Server.java (line 182) Stop listening for CQL clients INFO [RMI TCP Connection(17)-10.1.129.27] 2014-08-23 09:57:08,270 Gossiper.java (line 1279) Announcing shutdown INFO [RMI TCP Connection(17)-10.1.129.27] 2014-08-23 09:57:10,271 MessagingService.java (line 683) Waiting for messaging service to quiesce INFO [ACCEPT-/10.1.129.27] 2014-08-23 09:57:10,272 MessagingService.java (line 923) MessagingService has terminated the accept() thread INFO [RMI TCP Connection(17)-10.1.129.27] 2014-08-23 09:57:10,280 StorageService.java (line 1007) DECOMMISSIONED The decommissioned node no longer appears in OpsCenter, and 'nodetool status' shows it gone from the cluster as well, with the remaining 4 nodes un UN state. All is good... Then I noticed that the DownEndpointCount (still) shows as 1 - using a JMX console, and browsing to org.apache.cassandra.net, FailureDetector, Attributes, DownEdpointCount. While there, I also noticed that SimpleStates shows the decommissioned node as down, and the AllEndpointStates shows it as STATUS:LEFT I tried running a 'nodetool removenode decom-node's-host-id', but it failed with "Host ID not found", which I expected, given I decommissioned it and it does not show in nodetool status. nodetool describecluster lists only the expected 4 nodes (does not show the decommissioned node) checking the system.peers table lists the decomm-ed node with a null host_id, rack, release_version, rpc_address, schema_version, etc. Adding JVM_OPTS="$JVM_OPTS -Dcassandra.load_ring_state=false" to the Cassandra-env.sh as suggested here: https://issues.apache.org/jira/browse/CASSANDRA-6053 does not help. I have actually tried this before, when I was decommissioning a node on an older C* version and it worked, but now it does nothing. If I delete the row mentioning the decommissioned node from the system.peers table it stays out of there until the next dse service restart. This is causing apps to timeout, since they get a invalid node's IP... As a workaround I remove the entry from the peers table, but it is not permanent... -- This message was sent by Atlassian JIRA (v6.2#6252)