cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "nayden kolev (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-7825) node decommission leaves ghost nodes in system.peers table and JMX
Date Mon, 25 Aug 2014 17:12:58 GMT
nayden kolev created CASSANDRA-7825:
---------------------------------------

             Summary: node decommission leaves ghost nodes in system.peers table and JMX
                 Key: CASSANDRA-7825
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7825
             Project: Cassandra
          Issue Type: Bug
         Environment: OS: Ubuntu 12.04.4 LTS
Cassandra: ReleaseVersion: 2.0.8.39
DSE 4.5.1
OpsCenter: 5.0.0

            Reporter: nayden kolev


I have a 4-node cluster (split in 2 DCs) running DSE 4.5.1, C* 2.0.8.39. I needed to cycle
a node (add a new node and remove one). I followed this doc (more specifically steps 1 and
2):
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_remove_node_t.html

After the decom, the decommissioned node logged this:

INFO [RMI TCP Connection(17)-10.1.129.27] 2014-08-23 09:57:08,243 ThriftServer.java (line
141) Stop listening to thrift clients
INFO [RMI TCP Connection(17)-10.1.129.27] 2014-08-23 09:57:08,269 Server.java (line 182) Stop
listening for CQL clients
INFO [RMI TCP Connection(17)-10.1.129.27] 2014-08-23 09:57:08,270 Gossiper.java (line 1279)
Announcing shutdown
INFO [RMI TCP Connection(17)-10.1.129.27] 2014-08-23 09:57:10,271 MessagingService.java (line
683) Waiting for messaging service to quiesce
INFO [ACCEPT-/10.1.129.27] 2014-08-23 09:57:10,272 MessagingService.java (line 923) MessagingService
has terminated the accept() thread
INFO [RMI TCP Connection(17)-10.1.129.27] 2014-08-23 09:57:10,280 StorageService.java (line
1007) DECOMMISSIONED

The decommissioned node no longer appears in OpsCenter, and 'nodetool status' shows it gone
from the cluster as well, with the remaining 4 nodes un UN state.

All is good... Then I noticed that the DownEndpointCount (still) shows as 1 - using a JMX
console, and browsing to org.apache.cassandra.net, FailureDetector, Attributes, DownEdpointCount.
While there, I also noticed that SimpleStates shows the decommissioned node as down, and the
AllEndpointStates shows it as STATUS:LEFT

I tried running a 'nodetool removenode decom-node's-host-id', but it failed with "Host ID
not found", which I expected, given I decommissioned it and it does not show in nodetool status.

nodetool describecluster lists only the expected 4 nodes (does not show the decommissioned
node)

checking the system.peers table lists the decomm-ed node with a null host_id, rack, release_version,
rpc_address, schema_version, etc.

Adding JVM_OPTS="$JVM_OPTS -Dcassandra.load_ring_state=false" to the Cassandra-env.sh as suggested
here:

https://issues.apache.org/jira/browse/CASSANDRA-6053

does not help. I have actually tried this before, when I was decommissioning a node on an
older C* version and it worked, but now it does nothing. If I delete the row mentioning the
decommissioned node from the system.peers table it stays out of there until the next dse service
restart.

This is causing apps to timeout, since they get a invalid node's IP... As a workaround I remove
the entry from the peers table, but it is not permanent...




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message