cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live
Date Fri, 20 Nov 2015 07:28:11 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15015353#comment-15015353
] 

Stefania commented on CASSANDRA-10243:
--------------------------------------

Thanks for the review! I plan on working early next week on your comments, see answers below,
as well as any other comments you may have.

bq. Is it necessary to check if a node is in dead state for the purpose of this snitch check?
In my understanding, if a node is on a dead state, it's neither live nor member of the ring,
so I didn't get why that check was done previously on getLiveTokenOwners() in the first place,
do you know? Maybe historical reasons? I'd prefer to have a simpler isLiveMember() check on
StorageService (since it checks both gossip and tokenmetadata), and this method would basically
return Gossiper.isLiveEndpoint(endpoint) && tokenMetadata.isMember(ep), but this is
a personal thing so it's up to you to take this suggestion.

>From a quick code analysis I think leaving nodes are still members but their state is
dead? In any case, my preference would be to leave existing code unchanged, especially if
this goes to 2.1, but I am not opposed to simplifying the new liveliness check for the snitch
to what you suggested, {{Gossiper.isLiveEndpoint(endpoint) && tokenMetadata.isMember(ep)}},
since this would mean leaving nodes are also live, which is safer I believe.

bq. Did you intend to decrease the default snitch configuration refresh period from 60 to
5 seconds?

Yes I did. I reduced it so that the dtests could complete in a reasonable amount of time.
I don't see why wait for up to 60 seconds before reloading a config file, 5 seconds is a pretty
long time and it should not have any adverse impact.

bq. On GossipingPropertyFileSnitch I think it's only necessary to check if the dc/rack changed,
or do you see a situation where one would want to live change the rack/dc of a non-ring memmber?

Maybe start the node with {{-Djoin_ring=false}}, change the rack and then join the ring? If
the node is not live I'd say let them change the rack/dc even though I agree it doesn't make
much sense. I'm really undecided to be honest, basically the only reason to reload the GPFS
config should be to change {{preferLocal}} so maybe we should never allow chaning dc/rack
for GPFS, or remove the config reload altogether as suggested in CASSANDRA-9474.
 
bq. Also on the GossipingPropertyFileSnitch maybe it's not necessary to updateTopology/invalidateCachedRing,
since topology change is not allowed anymore?

{{updateTopology}} definitely no longer makes sense but {{invalidateCachedRing}} is probably
safer to keep it, at least on startup. However see my next question, in which case we would
need to keep {{updateTopology}}.

Should we add a JVM property to override the liveliness checks, just as a safety measure in
case someone has a legitimate reason to change rack/dc of a live node?

> Warn or fail when changing cluster topology live
> ------------------------------------------------
>
>                 Key: CASSANDRA-10243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Jonathan Ellis
>            Assignee: Stefania
>            Priority: Critical
>             Fix For: 2.1.x
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is almost always
the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message