ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Berry <chriswbe...@gmail.com>
Subject Re: Race Condition at Grid Startup
Date Tue, 23 May 2017 20:19:51 GMT
Hi Yakov,

What you are asking for is difficult.

As I've explained; Ignite Nodes will not show up in Consul until they are
capable of responding to their Health Checks. This means they must be
initialized and capable of responding to a Request for a Heath Check with
"OK". Currently, this means that Ignite must be started -- so that we can
configure Caches & DataStreamers and wire them with all the other Dependency
Injected Beans in the system. So that teh entire system can be configured
and respond to Requests.

So we have a chicken and egg situation.

What is needed is to allow us to either:

1) Postpone a Node's attempt to join the Cluster until it is alive and well.
In other words, to wait for a lifecycle event that the Node is started, and
to fire an explicit call to "join()".

2) To allow a Node to explicitly attempt to re-join the Cluster. Again, this
would be fired by a lifecycle event.

I have read all the existing implementations, and they all seem to rely on
the fact that somehow, magically a Node will know the other Nodes in the
Cluster. I suspect this is because they use some static List. But clearly,
if I start 10 Nodes simultaneously in the Cloud, this is difficult. Nodes
will not have IPs and in Mesos/Marathon; Ports, until they are started. 

The point is that Consul is controlling the List of Cluster Nodes, not
Ignite. Nodes register during startup (in startup scripts, using Container
Pilot), but they are not "seen" until they pass their Health Checks.
Conversely, as Nodes come and go, Consul is aware, and will always return
the current known Cluster List.  

You can see this reflected in the final form of the ConsulIpFinder I posted
above. (Not the one you quoted)
I modeled that Impl after this code: 

The problem is simply that we have no programmatic control over when Join()
is called.

NOTE: if there is no way this can get corrected. I will have to somehow
rewrite my app to defer the Ignite.start() until I get a "Server is started"
event. Which implies that I will have to also "lazy init" all of my caches,
etc. This a pretty large refactoring. But if I must, I will do it. Although,
I must say, I suspect that many others will find themselves in the same boat
as I...

Thanks much, 
-- Chris 

View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Race-Condition-at-Grid-Startup-tp13038p13101.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

View raw message