cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "nayden kolev (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7825) node decommission leaves ghost nodes in system.peers table and JMX
Date Tue, 26 Aug 2014 21:41:57 GMT


nayden kolev commented on CASSANDRA-7825:

So it looks like 3 days after the decommissioning was performed, the stats and peers table
cleared. I am guessing Brandon Williams' reference "we store dead gossip states for 3 days"

may have been what was happening. The JMX counter values still cause confusion, since they
show a down host, and OpsCenter and nodetool do not. Additionally, the decommissioned node
still shows up for clients (I am in the process of looking into this further to see what the
impact is, and if there is a workaround). 

> node decommission leaves ghost nodes in system.peers table and JMX
> ------------------------------------------------------------------
>                 Key: CASSANDRA-7825
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: OS: Ubuntu 12.04.4 LTS
> Cassandra: ReleaseVersion:
> DSE 4.5.1
> OpsCenter: 5.0.0
>            Reporter: nayden kolev
>            Priority: Minor
> I have a 4-node cluster (split in 2 DCs) running DSE 4.5.1, C* I needed to
cycle a node (add a new node and remove one). I followed this doc (more specifically steps
1 and 2):
> After the decom, the decommissioned node logged this:
> INFO [RMI TCP Connection(17)-] 2014-08-23 09:57:08,243 (line
141) Stop listening to thrift clients
> INFO [RMI TCP Connection(17)-] 2014-08-23 09:57:08,269 (line 182)
Stop listening for CQL clients
> INFO [RMI TCP Connection(17)-] 2014-08-23 09:57:08,270 (line
1279) Announcing shutdown
> INFO [RMI TCP Connection(17)-] 2014-08-23 09:57:10,271
(line 683) Waiting for messaging service to quiesce
> INFO [ACCEPT-/] 2014-08-23 09:57:10,272 (line 923) MessagingService
has terminated the accept() thread
> INFO [RMI TCP Connection(17)-] 2014-08-23 09:57:10,280
> The decommissioned node no longer appears in OpsCenter, and 'nodetool status' shows it
gone from the cluster as well, with the remaining 4 nodes un UN state.
> All is good... Then I noticed that the DownEndpointCount (still) shows as 1 - using a
JMX console, and browsing to, FailureDetector, Attributes, DownEdpointCount.
While there, I also noticed that SimpleStates shows the decommissioned node as down, and the
AllEndpointStates shows it as STATUS:LEFT
> I tried running a 'nodetool removenode decom-node's-host-id', but it failed with "Host
ID not found", which I expected, given I decommissioned it and it does not show in nodetool
> nodetool describecluster lists only the expected 4 nodes (does not show the decommissioned
> checking the system.peers table lists the decomm-ed node with a null host_id, rack, release_version,
rpc_address, schema_version, etc.
> Adding JVM_OPTS="$JVM_OPTS -Dcassandra.load_ring_state=false" to the
as suggested here:
> does not help. I have actually tried this before, when I was decommissioning a node on
an older C* version and it worked, but now it does nothing. If I delete the row mentioning
the decommissioned node from the system.peers table it stays out of there until the next dse
service restart.
> This is causing apps to timeout, since they get a invalid node's IP... As a workaround
I remove the entry from the peers table, but it is not permanent...

This message was sent by Atlassian JIRA

View raw message