ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Berry <chriswbe...@gmail.com>
Subject Re: Race Condition at Grid Startup
Date Sat, 20 May 2017 13:58:35 GMT
We have written our own `ConsulIpFinder extends TcpDiscoveryIpFinderAdapter` 

It works like this: 
1) As Ignite Nodes are started (inside Docker containers), their Ignite
Discovery Port is registered with Consul (using Container Pilot within our
start-up script) as a "Service" 
2) When Ignite is started in a given Node;  a `ConsulIpFinder` is registered
with `discoverySpi.setIpFinder()` 
3) Within this `ConsulIpFinder` -- it queries Consul for all those “passing”
Services with a registered Ignite Discovery Port.
4) And then creates the current List of Host:Port for all Cluster members.
(for getRegisteredAddresses()) 

A Service will not register itself as "passing" until its Health Checks
return "OK".
This cannot happen until the application is fully initialized and able to
receive Requests. (respond to the Health Check)
And by that time, we have already initialized Ignite (i.e. called

Which, as you can see, is a chicken-and-egg situation.

If we start 1 Node first, then we have a seed for the Cluster, and it all
But starting them all simultaneously -- as is most common in a Cloud
environment -- they mostly never find each other. Because the List of
Cluster members comes too late in the game.

It really seems that deferring discovery until it can be explicitly invoked
makes great sense.
Then it can be invoked as a response to a lifecycle event (e.g. a "Server
started" event).
You could simply default it to start during initialization like today, but
give us an option...

Note that we cannot defer the starting of Ignite, because it is heavily
involved in the initialization process -- creating caches, data streamers
and such --  and wiring them into different Components.

The other workaround would be to allow rediscovery.
Then, we can force the initial discovery to during Ignite.start() to be a
no-op (I.e. to get an empty List of Nodes in the IpFinder). And simply force
a rediscovery when the system is alive-and-well.

-- Chris

View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Race-Condition-at-Grid-Startup-tp13038p13045.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

View raw message