ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexey Goncharuk <alexey.goncha...@gmail.com>
Subject Re: Failed to wait for initial partition map exchange
Date Thu, 14 Jul 2016 22:51:33 GMT
>
> Alexey, I like the idea in general, but killing non-responsive nodes seems
> a bit drastic to me. How about this approach:
>
> - print out IDs/IPs of non-responsive nodes at all times
> - introduce a certain kill timeout for non-responsive nodes (-1 means
> disabled)
> - the timeout should be at least a minute after the 1st non-responsive node
> message is printed
> - when the timeout expires, we should kill the nodes and automatically
> collect their thread dumps
> - we should print out a message asking users to provide these thread dumps
> to us via Jira or dev list
>
> What do you think?
>

Sounds like a plan. I will create a ticket soon if there are no objections.

--AG

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message