Return-Path: X-Original-To: apmail-ignite-issues-archive@minotaur.apache.org Delivered-To: apmail-ignite-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3686A10A89 for ; Tue, 8 Sep 2015 07:49:52 +0000 (UTC) Received: (qmail 94577 invoked by uid 500); 8 Sep 2015 07:49:45 -0000 Delivered-To: apmail-ignite-issues-archive@ignite.apache.org Received: (qmail 94295 invoked by uid 500); 8 Sep 2015 07:49:45 -0000 Mailing-List: contact issues-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list issues@ignite.apache.org Received: (qmail 94281 invoked by uid 99); 8 Sep 2015 07:49:45 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Sep 2015 07:49:45 +0000 Date: Tue, 8 Sep 2015 07:49:45 +0000 (UTC) From: "Semen Boikov (JIRA)" To: issues@ignite.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (IGNITE-1123) Instability and broken topology when multiple server and client nodes are restarted MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/IGNITE-1123?page=3Dcom.atlassi= an.jira.plugin.system.issuetabpanels:all-tabpanel ] Semen Boikov updated IGNITE-1123: --------------------------------- Issue Type: Sub-task (was: Bug) Parent: IGNITE-1345 > Instability and broken topology when multiple server and client nodes are= restarted > -------------------------------------------------------------------------= ---------- > > Key: IGNITE-1123 > URL: https://issues.apache.org/jira/browse/IGNITE-1123 > Project: Ignite > Issue Type: Sub-task > Components: clients, general > Affects Versions: sprint-7 > Reporter: Denis Magda > Assignee: Denis Magda > Priority: Critical > Fix For: ignite-1.5 > > > The bug is always reproduced with TcpDiscoveryMultiThreadedTest.testMulti= ThreadedClientsServersRestart. > The test starts multiple servers and clients and then restarts them from = multiple thread. At some point it will lead to one or all of the following: > 1) Broken topology on a client side: > {noformat} > java.lang.AssertionError: TcpDiscoveryNodeAddFinishedMessage [nodeId=3D70= 576075-b528-43f4-b490-33d079dc7007, super=3DTcpDiscoveryAbstractMessage [sn= dNodeId=3Dnull, id=3D8f7e19c8e41-10b88275-1868-4faf-9ae0-d61d627b1001, veri= fierNodeId=3D10b88275-1868-4faf-9ae0-d61d627b1001, topVer=3D89, pendingIdx= =3D0, isClient=3Dfalse]] > =09at org.apache.ignite.spi.discovery.tcp.ClientImpl.updateTopologyHistor= y(ClientImpl.java:589) > =09at org.apache.ignite.spi.discovery.tcp.ClientImpl.access$2500(ClientIm= pl.java:48) > =09at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.proces= sNodeAddFinishedMessage(ClientImpl.java:1370) > =09at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.proces= sDiscoveryMessage(ClientImpl.java:1227) > =09at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.proces= sClientReconnectMessage(ClientImpl.java:1552) > =09at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.proces= sDiscoveryMessage(ClientImpl.java:1235) > =09at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(C= lientImpl.java:1197) > =09at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > {noformat}=20 > 2) Clients segmentation that is not properly processed by GridCachePartio= nExchangeManager and that leads to the test hang: > {noformat} > Still waiting for initial partition map exchange [fut=3DGridDhtPartitions= ExchangeFuture [dummy=3Dfalse, forcePreload=3Dfalse, reassign=3Dfalse, disc= oEvt=3DDiscoveryEvent [evtNode=3DTcpDiscoveryNode [id=3D60b736c0-a2ad-4465-= a3c1-7656e1fa9006, addrs=3D[127.0.0.1], sockAddrs=3D[/127.0.0.1:0], discPor= t=3D0, order=3D255, intOrder=3D0, loc=3Dtrue, ver=3D1.4.1#19700101-sha1:000= 00000, isClient=3Dtrue], topVer=3D255, nodeId8=3D60b736c0, msg=3Dnull, type= =3DNODE_JOINED, tstamp=3D1436877471568], rcvdIds=3DGridConcurrentHashSet [e= lements=3D[7098ffd9-f81b-40bf-9b9e-b0935d394007, 30b30924-a9e7-45fb-9aeb-36= 1bbb482003, 00d7e953-ce8b-45e0-a1f3-be7a6dea1000, 301559de-a129-4b85-852f-f= 8325649f003, 7078cd7a-e6b9-4bed-b829-a2e792a0c007, 20de002e-7e98-4e55-b25f-= d873e25db002, 00c444ad-b221-4695-9a6d-5ea529779000, 40dc3691-7b20-41d7-a436= -65ad27f74004, 308f0e4a-507a-4da6-b086-bdecc08e1003, 20b1d488-7aa1-41d7-ac0= b-e8730cccc002, 00da0ab2-9441-4cc2-b787-c34dcf6a2000, 4096e7dd-e3fe-4704-9d= 11-3b267430e004, 1059ba84-4ca7-4d8f-9563-b90334d48001]], rmtIds=3D[30b30924= -a9e7-45fb-9aeb-361bbb482003, 20de002e-7e98-4e55-b25f-d873e25db002, 4096e7d= d-e3fe-4704-9d11-3b267430e004, 00c444ad-b221-4695-9a6d-5ea529779000], exchI= d=3DGridDhtPartitionExchangeId [topVer=3DAffinityTopologyVersion [topVer=3D= 255, minorTopVer=3D0], nodeId=3D60b736c0, evt=3DNODE_JOINED], init=3Dtrue, = ready=3Dfalse, replied=3Dfalse, added=3Dtrue, initFut=3DGridFutureAdapter [= resFlag=3D2, res=3Dtrue, startTime=3D1436877471578, endTime=3D1436877471578= , ignoreInterrupts=3Dfalse, lsnr=3Dnull, state=3DDONE], topSnapshot=3Dnull,= lastVer=3Dnull, partReleaseFut=3Dnull, skipPreload=3Dtrue, clientOnlyExcha= nge=3Dtrue, oldest=3D20de002e-7e98-4e55-b25f-d873e25db002, oldestOrder=3D25= 4, evtLatch=3D0, remaining=3D[], super=3DGridFutureAdapter [resFlag=3D0, re= s=3Dnull, startTime=3D1436877471578, endTime=3D0, ignoreInterrupts=3Dfalse,= lsnr=3Dnull, state=3DINIT]]] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)