ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexey Goncharuk <alexey.goncha...@gmail.com>
Subject Re: Failed to wait for initial partition map exchange
Date Thu, 14 Jul 2016 21:02:21 GMT
This is a cross-post from a user list.

We faced this issue for a lot of times before and got a lot of users
complaining about the whole cluster freeze. We can protect a cluster from
such a situation simply by dropping non-responsive nodes from the cluster.
Of course, we need to get to the bottom of the root cause, and killing
nodes may cause some data loss in the cluster, but I think it is better
than restarting the whole cluster from scratch.

To summarize, I suggest to 'kill' non-responsive nodes from topology after
some timeout in exchange future.
‚Äč
Thoughts?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message