From issues-return-98706-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Tue Sep 17 10:40:02 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 1D4D3180645 for ; Tue, 17 Sep 2019 12:40:02 +0200 (CEST) Received: (qmail 34693 invoked by uid 500); 17 Sep 2019 10:40:01 -0000 Mailing-List: contact issues-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list issues@ignite.apache.org Received: (qmail 34683 invoked by uid 99); 17 Sep 2019 10:40:01 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Sep 2019 10:40:01 +0000 Received: from jira-he-de.apache.org (static.172.67.40.188.clients.your-server.de [188.40.67.172]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id B24D4E3141 for ; Tue, 17 Sep 2019 10:40:00 +0000 (UTC) Received: from jira-he-de.apache.org (localhost.localdomain [127.0.0.1]) by jira-he-de.apache.org (ASF Mail Server at jira-he-de.apache.org) with ESMTP id 2CE8C7806CB for ; Tue, 17 Sep 2019 10:40:00 +0000 (UTC) Date: Tue, 17 Sep 2019 10:40:00 +0000 (UTC) From: "Ilya Kasnacheev (Jira)" To: issues@ignite.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (IGNITE-8098) Getting affinity for topology version earlier than affinity is calculated because of data race MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/IGNITE-8098?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D169= 31277#comment-16931277 ]=20 Ilya Kasnacheev commented on IGNITE-8098: ----------------------------------------- Possible duplicate of IGNITE-11465 > Getting affinity for topology version earlier than affinity is calculated= because of data race > -------------------------------------------------------------------------= --------------------- > > Key: IGNITE-8098 > URL: https://issues.apache.org/jira/browse/IGNITE-8098 > Project: Ignite > Issue Type: Bug > Affects Versions: 2.3 > Reporter: Andrey Aleksandrov > Priority: Minor > Fix For: 2.8 > > > From time to time the Ignite cluster with services throws next exception = during restarting of=C2=A0 some nodes: > java.lang.IllegalStateException: Getting affinity for topology version ea= rlier than affinity is calculated [locNode=3DTcpDiscoveryNode [id=3Dc770dbc= f-2908-442d-8aa0-bf26a2aecfef, addrs=3D[10.44.162.169, 127.0.0.1], sockAddr= s=3D[clrv0000041279.ic.ing.net/10.44.162.169:56500, /127.0.0.1:56500], disc= Port=3D56500, order=3D11, intOrder=3D8, lastExchangeTime=3D1520931375337, l= oc=3Dtrue, ver=3D2.3.3#20180213-sha1:f446df34, isClient=3Dfalse], grp=3Dign= ite-sys-cache, topVer=3DAffinityTopologyVersion [topVer=3D13, minorTopVer= =3D0], head=3DAffinityTopologyVersion [topVer=3D15, minorTopVer=3D0], histo= ry=3D[AffinityTopologyVersion [topVer=3D11, minorTopVer=3D0], AffinityTopol= ogyVersion [topVer=3D11, minorTopVer=3D1], AffinityTopologyVersion [topVer= =3D12, minorTopVer=3D0], AffinityTopologyVersion [topVer=3D15, minorTopVer= =3D0]]] > Looks like the reason of this issue is the data race in=C2=A0GridServiceP= rocessor class. > How to reproduce: > 1)To simulate data race you should update next place in source code: > Class: GridServiceProcessor > Method: @Override public void onEvent(final DiscoveryEvent evt, final Dis= coCache discoCache) { > Place: > .... > try { > svcName.set(dep.configuration().getName()); > ctx.cache().internalCache(UTILITY_CACHE_NAME).context().affinity(). > affinityReadyFuture(topVer).get(); > //HERE (between GET and REASSIGN) you should=C2=A0add=C2=A0Thread.sleep(1= 00) for example. > //try { > //Thread.sleep(100); > //} > //catch (InterruptedException e1) { > //e1.printStackTrace(); > //} > =20 > reassign(dep, topVer); > } > catch (IgniteCheckedException ex) { > if (!(e instanceof ClusterTopologyCheckedException)) > LT.error(log, ex, "Failed to do service reassignment (will retry): " + > dep.configuration().getName()); > retries.add(dep); > } > ... > 2)After that you should imitate start/shutdown iterations. For reproducin= g I used GridServiceProcessorBatchDeploySelfTest (but timeout on future.get= should be increased to avoid timeout error) -- This message was sent by Atlassian Jira (v8.3.2#803003)