cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Knighton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10371) Decommissioned nodes can remain in gossip
Date Thu, 18 Feb 2016 17:38:18 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152695#comment-15152695
] 

Joel Knighton commented on CASSANDRA-10371:
-------------------------------------------

Thanks - those logs confirm my suspicion that 10.0.2.128 is propagating the EndpointState
through the cluster and not evicting it. One more piece of information will allow me to root-cause
this and suggest a fix.

If you connect to 10.0.2.128 over JMX, on org.apache.cassandra.net.FailureDetector, there
should be an operation dumpInterArrivalTimes(). Invoking that operation over JMX will create
a file in the Java temporary directory (likely just "/tmp") called "failuredetector-{SOME
NUMBERS}.dat". If you could attach that file to this ticket, I can diagnose the issue further.
There is no sensitive information in that file; it will just contain the samples of gossip
arrival time for nodes in the cluster.

Thanks again; your help in working with a running cluster with this issue is tremendously
helpful.

> Decommissioned nodes can remain in gossip
> -----------------------------------------
>
>                 Key: CASSANDRA-10371
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10371
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Distributed Metadata
>            Reporter: Brandon Williams
>            Assignee: Joel Knighton
>            Priority: Minor
>
> This may apply to other dead states as well.  Dead states should be expired after 3 days.
 In the case of decom we attach a timestamp to let the other nodes know when it should be
expired.  It has been observed that sometimes a subset of nodes in the cluster never expire
the state, and through heap analysis of these nodes it is revealed that the epstate.isAlive
check returns true when it should return false, which would allow the state to be evicted.
 This may have been affected by CASSANDRA-8336.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message