ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Berry <chriswbe...@gmail.com>
Subject Race Condition at Grid Startup
Date Fri, 19 May 2017 21:12:38 GMT
Hi,

I have a chicken-and-egg problem.
I am trying to create a ConsulIpFinder – which uses our Consul-based service
discovery under the covers.

(I asked about this without luck here:
http://apache-ignite-users.70518.x6.nabble.com/ConsulIpFinder-TcpDiscoveryIpFinder-issue-td12974.html
)

My problem is this.
If I start 1 Node, then wait until it is alive, and then start N Nodes, I
never have any issues getting all of my Nodes to find each other in the
Grid. 100% success.
But, if I try to start all N Nodes simultaneously, I get most Nodes starting
up thinking they are isolated from each other. Almost 100% failure.
(Note; we use Mesos/Marathon to manage our Nodes, and would like to be able
to start them all simultaneously. We really do not want a special, manual
process)

The chicken-and-egg problem is because: 
1) I must start Ignite as I am starting up the Node, which means that it
will try to discover the Grid.
2) But, until a Node is started, its Consul Health Check will fail, and
thus, the Node will not appear in Consul, and therefore not in my IpFinder.
Thus, Ignite cannot discover all of the other Nodes in the Grid because they
are not yet available to the IpFinder.
In general, as they all start, they are unaware of each other.

What I need is either 
1) A way to defer discovery until I can start it explicitly – later in the
lifecycle. Yet, start Ignite enough that I can create Caches, etc.
2) Or a way to force a Node to retry joining the Grid. 

Because, even though my IpFinder eventually has all of the Nodes in the Grid
in its getRegisteredAddresses().
It never attempts to reregister itself with the Grid.

I hope this question makes sense.

I would love to manage this in my code.
Because it appears my only hope right now is to create a nasty hack that
staggers the start in my startup script (with a random sleep),
and that seems like a terrible option.

Every IpFinder I have read assume that somehow, magically I have a
predefined list of IP:Ports. 
But that is difficult in the ephemeral world of the Cloud.

Thanks,
-- Chris 



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Race-Condition-at-Grid-Startup-tp13038.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Mime
View raw message