cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Didier (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10371) Decommissioned nodes can remain in gossip
Date Fri, 18 Dec 2015 09:44:46 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063764#comment-15063764
] 

Didier commented on CASSANDRA-10371:
------------------------------------

Is it planned to release a fix for the 2.0.x branch for this issue ?

I have this problem in production with C* 2.0.16, is it fixed in C* 2.0.17 ?

Every n minutes we have a gossiping flood like that : 

 INFO [GossipStage:2] 2015-12-18 10:29:05,082 Gossiper.java (line 962) InetAddress /192.168.128.27
is now DOWN
 INFO [GossipStage:2] 2015-12-18 10:29:05,083 StorageService.java (line 1781) Removing tokens
[100029758220565479311893935069170672938, ...., 99324782484008101117663863086419168046] for
/192.168.128.27
 INFO [GossipStage:2] 2015-12-18 10:40:44,253 Gossiper.java (line 962) InetAddress /192.168.128.27
is now DOWN
 INFO [GossipStage:2] 2015-12-18 10:40:44,254 StorageService.java (line 1781) Removing tokens
[100029758220565479311893935069170672938, ..., 99324782484008101117663863086419168046] for
/192.168.128.27

The impacted nodes aren't in system.peers and nodetool ring/status, and they have been decommissioned
properly from the DC.

Do you plan to release a new release 2.0.18 with a fix or do you recommand to upgrade to C*
2.1 or later ?

Best regards,

Didier

> Decommissioned nodes can remain in gossip
> -----------------------------------------
>
>                 Key: CASSANDRA-10371
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10371
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Distributed Metadata
>            Reporter: Brandon Williams
>            Assignee: Stefania
>            Priority: Minor
>
> This may apply to other dead states as well.  Dead states should be expired after 3 days.
 In the case of decom we attach a timestamp to let the other nodes know when it should be
expired.  It has been observed that sometimes a subset of nodes in the cluster never expire
the state, and through heap analysis of these nodes it is revealed that the epstate.isAlive
check returns true when it should return false, which would allow the state to be evicted.
 This may have been affected by CASSANDRA-8336.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message