Mailing-List: contact issues-help@ignite.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@ignite.apache.org
Date: Mon, 26 Dec 2016 11:24:58 +0000 (UTC)
From: "Vladislav Pyatkov (JIRA)" <jira@apache.org>
To: issues@ignite.apache.org
Message-ID: <JIRA.13030566.1482751444000.594971.1482751498477@Atlassian.JIRA>
In-Reply-To: <JIRA.13030566.1482751444000@Atlassian.JIRA>
References: <JIRA.13030566.1482751444000@Atlassian.JIRA> <JIRA.13030566.1482751444558@arcas>
Subject: [jira] [Created] (IGNITE-4491) Commutation loss between two nodes
 leads to hang whole cluster.
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Mon, 26 Dec 2016 11:25:00 -0000

Vladislav Pyatkov created IGNITE-4491:
-----------------------------------------

             Summary: Commutation loss between two nodes leads to hang whole cluster.
                 Key: IGNITE-4491
                 URL: https://issues.apache.org/jira/browse/IGNITE-4491
             Project: Ignite
          Issue Type: Bug
    Affects Versions: 1.8
            Reporter: Vladislav Pyatkov
            Priority: Critical


Reproduction steps:
1) Start nodes:

DC1                       DC2

1 (10.116.172.1)      8 (10.116.64.11)
2 (10.116.172.2)      7 (10.116.64.12)
3 (10.116.172.3)      6 (10.116.64.13)
4 (10.116.172.4)      5 (10.116.64.14)

each node have client which run in same host with server (look source in attachment).

2) Drop connection

Between 1-8,

1 (10.116.172.1)      8 (10.116.64.11)

Drop all input and output traffic
Invoke from 10.116.172.1
iptables -A INPUT -s 10.116.64.11 -j DROP
iptables -A OUTPUT -d 10.116.64.11 -j DROP

Between  4-5

4 (10.116.172.4)      5 (10.116.64.14)

Invoke from 10.116.172.4
iptables -A INPUT -s 10.116.64.14 -j DROP
iptables -A OUTPUT -d 10.116.64.14 -j DROP

3) Stop the grid, after several seconds

If you are looking into logs, you can find which node was segmented (pay attention, which clients did not segmented.), after drop traffic:
[12:04:33,914][INFO][disco-event-worker-#211%null%][GridDiscoveryManager] Topology snapshot [ver=18, servers=6, clients=8, CPUs=456, heap=68.0GB]

And all operations stopped at the same time.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)