ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Magda <dma...@gridgain.com>
Subject Finishing work on IGNITE-752 (Speed up failure detection)
Date Thu, 23 Jul 2015 10:31:58 GMT

During this week I've been working on an improvement that lets to detect 
failures at cluster nodes' discovery/communication/network levels as 
quick as possible and lets the user to tune such a behavior with a 
single configuration parameter.

Sure the failure detection exists for a long time in Ignite and the user 
is able to tune it BUT there are around *10* configuration parameters 
that have to be setup to achieve a desired result.

When IGNITE-752 is merged to the main development branch all this 
behavior will be possible to control with a single parameter - 

By setting the failure detection threshold for a server node it will be 
possible to detect failed nodes in a cluster topology during the time 
equal to threshold's value and switch to/keep working with only alive 
By setting the threshold for a client node will let us to connection 
failures between the client and its router node (a server node that is a 
part of a topology).

In addition, bunch of other improvements and simplifications were done 
at the level of TcpDiscoverySpi and TcpCommunicationSpi. Changes are 
aggregated here:

General review is passed. However if anyone wants to review as well or 
have any thoughts/suggestions don't hesitate to propose them.

Dmitiry S, I would like to ask you to review documentation changes in 
any case before I do a merge.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message