cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live
Date Fri, 20 Nov 2015 07:28:11 GMT


Stefania commented on CASSANDRA-10243:

Thanks for the review! I plan on working early next week on your comments, see answers below,
as well as any other comments you may have.

bq. Is it necessary to check if a node is in dead state for the purpose of this snitch check?
In my understanding, if a node is on a dead state, it's neither live nor member of the ring,
so I didn't get why that check was done previously on getLiveTokenOwners() in the first place,
do you know? Maybe historical reasons? I'd prefer to have a simpler isLiveMember() check on
StorageService (since it checks both gossip and tokenmetadata), and this method would basically
return Gossiper.isLiveEndpoint(endpoint) && tokenMetadata.isMember(ep), but this is
a personal thing so it's up to you to take this suggestion.

>From a quick code analysis I think leaving nodes are still members but their state is
dead? In any case, my preference would be to leave existing code unchanged, especially if
this goes to 2.1, but I am not opposed to simplifying the new liveliness check for the snitch
to what you suggested, {{Gossiper.isLiveEndpoint(endpoint) && tokenMetadata.isMember(ep)}},
since this would mean leaving nodes are also live, which is safer I believe.

bq. Did you intend to decrease the default snitch configuration refresh period from 60 to
5 seconds?

Yes I did. I reduced it so that the dtests could complete in a reasonable amount of time.
I don't see why wait for up to 60 seconds before reloading a config file, 5 seconds is a pretty
long time and it should not have any adverse impact.

bq. On GossipingPropertyFileSnitch I think it's only necessary to check if the dc/rack changed,
or do you see a situation where one would want to live change the rack/dc of a non-ring memmber?

Maybe start the node with {{-Djoin_ring=false}}, change the rack and then join the ring? If
the node is not live I'd say let them change the rack/dc even though I agree it doesn't make
much sense. I'm really undecided to be honest, basically the only reason to reload the GPFS
config should be to change {{preferLocal}} so maybe we should never allow chaning dc/rack
for GPFS, or remove the config reload altogether as suggested in CASSANDRA-9474.
bq. Also on the GossipingPropertyFileSnitch maybe it's not necessary to updateTopology/invalidateCachedRing,
since topology change is not allowed anymore?

{{updateTopology}} definitely no longer makes sense but {{invalidateCachedRing}} is probably
safer to keep it, at least on startup. However see my next question, in which case we would
need to keep {{updateTopology}}.

Should we add a JVM property to override the liveliness checks, just as a safety measure in
case someone has a legitimate reason to change rack/dc of a live node?

> Warn or fail when changing cluster topology live
> ------------------------------------------------
>                 Key: CASSANDRA-10243
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Jonathan Ellis
>            Assignee: Stefania
>            Priority: Critical
>             Fix For: 2.1.x
> Moving a node from one rack to another in the snitch, while it is alive, is almost always
the wrong thing to do.

This message was sent by Atlassian JIRA

View raw message