ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pavel Kovalenko (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IGNITE-10485) Ability to get know more about cluster state before NODE_JOINED event is fired cluster-wide
Date Fri, 30 Nov 2018 01:16:00 GMT
Pavel Kovalenko created IGNITE-10485:

             Summary: Ability to get know more about cluster state before NODE_JOINED event
is fired cluster-wide
                 Key: IGNITE-10485
                 URL: https://issues.apache.org/jira/browse/IGNITE-10485
             Project: Ignite
          Issue Type: Improvement
          Components: cache
            Reporter: Pavel Kovalenko
             Fix For: 2.8

Currently there are no good possibilities to get more knowledge about cluster before PME on
node join start.

It might be usefult to do some pre-work (activate components if cluster is active, calculate
baseline affinity, cleanup pds if baseline changed, etc.) before actual NODE_JOIN event is
triggered cluster-wide and PME is started.
Such pre-work will significantly speed-up PME in case of node join.
Currently the only place where it can be done is during processing NodeAdded message on local
joining node. 
But it's not a good idea, because it will freeze processing new discovery messages cluster-wide.

I see 2 ways how to implement it:

1) Introduce new intermediate state of node when it's discovered, but discovery event on node
join is not triggered yet. This is right, but complicated change, because it requires revisiting
joining process both in Tcp and Zk discovery protocols with extra failover scenarios.

2) Try to get this information and do pre-work before discovery manager start, using e.g.
GridRestProcessor. This looks much simplier, but we can have some races there, when during
pre-work cluster state has been changed (deactivation, baseline change). In this case we should
rollback it or just stop/restart the node to avoid cluster instability. However these are
rare scenarios in real world (e.g. start baseline node and start deactivation process right
after node recovery is finished).

For starters we can expose baseline and cluster state in our rest endpoint and try to move
out mentioned above pre-work things from PME. 

This message was sent by Atlassian JIRA

View raw message