Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2A08A200BF6 for ; Mon, 26 Dec 2016 12:25:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 28B5C160B2A; Mon, 26 Dec 2016 11:25:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9FA2D160B3B for ; Mon, 26 Dec 2016 12:24:59 +0100 (CET) Received: (qmail 33878 invoked by uid 500); 26 Dec 2016 11:24:58 -0000 Mailing-List: contact issues-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list issues@ignite.apache.org Received: (qmail 33861 invoked by uid 99); 26 Dec 2016 11:24:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Dec 2016 11:24:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 75AE92C0453 for ; Mon, 26 Dec 2016 11:24:58 +0000 (UTC) Date: Mon, 26 Dec 2016 11:24:58 +0000 (UTC) From: "Vladislav Pyatkov (JIRA)" To: issues@ignite.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (IGNITE-4491) Commutation loss between two nodes leads to hang whole cluster. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 26 Dec 2016 11:25:00 -0000 Vladislav Pyatkov created IGNITE-4491: ----------------------------------------- Summary: Commutation loss between two nodes leads to hang whole cluster. Key: IGNITE-4491 URL: https://issues.apache.org/jira/browse/IGNITE-4491 Project: Ignite Issue Type: Bug Affects Versions: 1.8 Reporter: Vladislav Pyatkov Priority: Critical Reproduction steps: 1) Start nodes: DC1 DC2 1 (10.116.172.1) 8 (10.116.64.11) 2 (10.116.172.2) 7 (10.116.64.12) 3 (10.116.172.3) 6 (10.116.64.13) 4 (10.116.172.4) 5 (10.116.64.14) each node have client which run in same host with server (look source in attachment). 2) Drop connection Between 1-8, 1 (10.116.172.1) 8 (10.116.64.11) Drop all input and output traffic Invoke from 10.116.172.1 iptables -A INPUT -s 10.116.64.11 -j DROP iptables -A OUTPUT -d 10.116.64.11 -j DROP Between 4-5 4 (10.116.172.4) 5 (10.116.64.14) Invoke from 10.116.172.4 iptables -A INPUT -s 10.116.64.14 -j DROP iptables -A OUTPUT -d 10.116.64.14 -j DROP 3) Stop the grid, after several seconds If you are looking into logs, you can find which node was segmented (pay attention, which clients did not segmented.), after drop traffic: [12:04:33,914][INFO][disco-event-worker-#211%null%][GridDiscoveryManager] Topology snapshot [ver=18, servers=6, clients=8, CPUs=456, heap=68.0GB] And all operations stopped at the same time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)