ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Biren <biren.s...@servicenow.com>
Subject Cluster segmentation
Date Sun, 20 Aug 2017 00:34:39 GMT
Hi,
I have embedded Ignite into my application and using it for distributed caches. I am running
Ignite cluster in my lab environment. I have two nodes in the cluster. Time to time I get
node segmented event and the node which receives it dies abruptly.

The application is registered to three discovery events node_left, node_failed and node_segmented.
On receiving these events each node checks if it is now oldest node is the cluster. This is
to check if oldest node has left/failed.

I am also listening to life cycle events. I am interested in before_node_stop and after_node_stop
events. On receiving these events, I need to stop another component of the application.


  1.  What are the reasons of getting node_segmented event?
     *   One reason is obviously network glitch. Node losing connectivity with other members
     *   Can high memory usage/long GC pause be reason for segmentation?
     *   Is there a way to get cause of the segmentation?
  2.  After getting node_segmented event, I immediately got before_node_stop event. But after_node_stop
did not follow. So node was kind of left in some inconsistent state. Never recovered from
that.
     *   Is it possible that on receiving node_segmented event, when I tried to get the oldest
node in the cluster caused the node to stop?
Event timeline from both nodes:

Application 1:
08/19/17  10:40:10 :  received node failed event. The event was caused by application 2.
08/19/17  10:40:10 :  [10:40:10] Topology snapshot [ver=3, servers=1, clients=0, CPUs=32,
heap=14.0GB]

Application 2:
08/19/17 10:40:28 : received node segmented event. The event was caused by application 2
08/19/17 10:40:28 : Checking if oldest has changed
08/19/17 10:40:28 : Ignite Lifecycle event received: BEFORE_NODE_STOP. Fires event to stop
another component
08/19/17 10:40:28 : dependent component stops
08/19/17 10:40:28 : received node failed event. The event was caused by application 1.
08/19/17 10:40:10 : Topology snapshot [ver=3, servers=1, clients=0, CPUs=32, heap=14.0GB]

Thanks,
Biren




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Cluster-segmentation-tp16314.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Mime
View raw message