cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Knighton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10969) long-running cluster sees bad gossip generation when a node restarts
Date Tue, 05 Jan 2016 20:45:39 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083749#comment-15083749
] 

Joel Knighton commented on CASSANDRA-10969:
-------------------------------------------

Your observations on this ticket and your comment on [CASSANDRA-8113] are correct; we should
handle the possibility of a legitimately long-running cluster properly. In my opinion, the
current behavior is a bug, and I'll work on a fix.

You are also correct that a rolling restart should fix this because a generation of 0 (as
after a restart) is special-cased in the check introduced.

> long-running cluster sees bad gossip generation when a node restarts
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-10969
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10969
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination
>         Environment: 4-node Cassandra 2.1.1 cluster, each node running on a Linux 2.6.32-431.20.3.dl6.x86_64
VM
>            Reporter: T. David Hudson
>            Assignee: Joel Knighton
>            Priority: Minor
>
> One of the nodes in a long-running Cassandra 2.1.1 cluster (not under my control) restarted.
 The remaining nodes are logging errors like this:
>     "received an invalid gossip generation for peer xxx.xxx.xxx.xxx; local generation
= 1414613355, received generation = 1450978722"
> The gap between the local and received generation numbers exceeds the one-year threshold
added for CASSANDRA-8113.  The system clocks are up-to-date for all nodes.
> If this is a bug, the latest released Gossiper.java code in 2.1.x, 2.2.x, and 3.0.x seems
not to have changed the behavior that I'm seeing.
> I presume that restarting the remaining nodes will clear up the problem, whence the minor
priority.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message