Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 84E36200CF4 for ; Sun, 20 Aug 2017 02:34:48 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 834EC1648CA; Sun, 20 Aug 2017 00:34:48 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9E3361648C6 for ; Sun, 20 Aug 2017 02:34:47 +0200 (CEST) Received: (qmail 86629 invoked by uid 500); 20 Aug 2017 00:34:46 -0000 Mailing-List: contact user-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@ignite.apache.org Delivered-To: mailing list user@ignite.apache.org Received: (qmail 86618 invoked by uid 99); 20 Aug 2017 00:34:46 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 20 Aug 2017 00:34:46 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id E412BC1C4F for ; Sun, 20 Aug 2017 00:34:45 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.972 X-Spam-Level: ** X-Spam-Status: No, score=2.972 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_SOFTFAIL=0.972] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id Tv8iEYTevjni for ; Sun, 20 Aug 2017 00:34:41 +0000 (UTC) Received: from mwork.nabble.com (mwork.nabble.com [162.253.133.43]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id E3E6D5FBC6 for ; Sun, 20 Aug 2017 00:34:40 +0000 (UTC) Received: from static.162.255.23.37.macminivault.com (unknown [162.255.23.37]) by mwork.nabble.com (Postfix) with ESMTP id BD0B85DB9834A for ; Sat, 19 Aug 2017 17:34:39 -0700 (MST) Date: Sat, 19 Aug 2017 17:34:39 -0700 (MST) From: Biren To: user@ignite.apache.org Message-ID: <4F13E55E-D6B0-400F-ADC6-9DB99F6AFCF6@servicenow.com> Subject: Cluster segmentation MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_133811_1753363416.1503189279733" archived-at: Sun, 20 Aug 2017 00:34:48 -0000 ------=_Part_133811_1753363416.1503189279733 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi, I have embedded Ignite into my application and using it for distributed caches. I am running Ignite cluster in my lab environment. I have two nodes in the cluster. Time to time I get node segmented event and the node which receives it dies abruptly. The application is registered to three discovery events node_left, node_failed and node_segmented. On receiving these events each node checks if it is now oldest node is the cluster. This is to check if oldest node has left/failed. I am also listening to life cycle events. I am interested in before_node_stop and after_node_stop events. On receiving these events, I need to stop another component of the application. 1. What are the reasons of getting node_segmented event? * One reason is obviously network glitch. Node losing connectivity with other members * Can high memory usage/long GC pause be reason for segmentation? * Is there a way to get cause of the segmentation? 2. After getting node_segmented event, I immediately got before_node_stop event. But after_node_stop did not follow. So node was kind of left in some inconsistent state. Never recovered from that. * Is it possible that on receiving node_segmented event, when I tried to get the oldest node in the cluster caused the node to stop? Event timeline from both nodes: Application 1: 08/19/17 10:40:10 : received node failed event. The event was caused by application 2. 08/19/17 10:40:10 : [10:40:10] Topology snapshot [ver=3, servers=1, clients=0, CPUs=32, heap=14.0GB] Application 2: 08/19/17 10:40:28 : received node segmented event. The event was caused by application 2 08/19/17 10:40:28 : Checking if oldest has changed 08/19/17 10:40:28 : Ignite Lifecycle event received: BEFORE_NODE_STOP. Fires event to stop another component 08/19/17 10:40:28 : dependent component stops 08/19/17 10:40:28 : received node failed event. The event was caused by application 1. 08/19/17 10:40:10 : Topology snapshot [ver=3, servers=1, clients=0, CPUs=32, heap=14.0GB] Thanks, Biren -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Cluster-segmentation-tp16314.html Sent from the Apache Ignite Users mailing list archive at Nabble.com. ------=_Part_133811_1753363416.1503189279733 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit

Hi,

I have embedded Ignite into my application and using it for distributed caches. I am running Ignite cluster in my lab environment. I have two nodes in the cluster. Time to time I get node segmented event and the node which receives it dies abruptly.

 

The application is registered to three discovery events node_left, node_failed and node_segmented. On receiving these events each node checks if it is now oldest node is the cluster. This is to check if oldest node has left/failed.

 

I am also listening to life cycle events. I am interested in before_node_stop and after_node_stop events. On receiving these events, I need to stop another component of the application.

 

  1. What are the reasons of getting node_segmented event?
    1. One reason is obviously network glitch. Node losing connectivity with other members
    2. Can high memory usage/long GC pause be reason for segmentation?
    3. Is there a way to get cause of the segmentation?
  2. After getting node_segmented event, I immediately got before_node_stop event. But after_node_stop did not follow. So node was kind of left in some inconsistent state. Never recovered from that.
    1. Is it possible that on receiving node_segmented event, when I tried to get the oldest node in the cluster caused the node to stop?

Event timeline from both nodes:

 

Application 1:

08/19/17  10:40:10 :  received node failed event. The event was caused by application 2.

08/19/17  10:40:10 :  [10:40:10] Topology snapshot [ver=3, servers=1, clients=0, CPUs=32, heap=14.0GB]

 

Application 2:

08/19/17 10:40:28 : received node segmented event. The event was caused by application 2

08/19/17 10:40:28 : Checking if oldest has changed

08/19/17 10:40:28 : Ignite Lifecycle event received: BEFORE_NODE_STOP. Fires event to stop another component

08/19/17 10:40:28 : dependent component stops

08/19/17 10:40:28 : received node failed event. The event was caused by application 1.

08/19/17 10:40:10 : Topology snapshot [ver=3, servers=1, clients=0, CPUs=32, heap=14.0GB]

 

Thanks,

Biren



View this message in context: Cluster segmentation
Sent from the Apache Ignite Users mailing list archive at Nabble.com.
------=_Part_133811_1753363416.1503189279733--