From issues-return-98706-archive-asf-public=cust-asf.ponee.io@ignite.apache.org  Tue Sep 17 10:40:02 2019
Return-Path: <issues-return-98706-archive-asf-public=cust-asf.ponee.io@ignite.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [207.244.88.153])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 1D4D3180645
	for <archive-asf-public@cust-asf.ponee.io>; Tue, 17 Sep 2019 12:40:02 +0200 (CEST)
Received: (qmail 34693 invoked by uid 500); 17 Sep 2019 10:40:01 -0000
Mailing-List: contact issues-help@ignite.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:issues-help@ignite.apache.org>
List-Unsubscribe: <mailto:issues-unsubscribe@ignite.apache.org>
List-Post: <mailto:issues@ignite.apache.org>
List-Id: <issues.ignite.apache.org>
Reply-To: dev@ignite.apache.org
Delivered-To: mailing list issues@ignite.apache.org
Received: (qmail 34683 invoked by uid 99); 17 Sep 2019 10:40:01 -0000
Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139)
    by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Sep 2019 10:40:01 +0000
Received: from jira-he-de.apache.org (static.172.67.40.188.clients.your-server.de [188.40.67.172])
	by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id B24D4E3141
	for <issues@ignite.apache.org>; Tue, 17 Sep 2019 10:40:00 +0000 (UTC)
Received: from jira-he-de.apache.org (localhost.localdomain [127.0.0.1])
	by jira-he-de.apache.org (ASF Mail Server at jira-he-de.apache.org) with ESMTP id 2CE8C7806CB
	for <issues@ignite.apache.org>; Tue, 17 Sep 2019 10:40:00 +0000 (UTC)
Date: Tue, 17 Sep 2019 10:40:00 +0000 (UTC)
From: "Ilya Kasnacheev (Jira)" <jira@apache.org>
To: issues@ignite.apache.org
Message-ID: <JIRA.13149437.1522658846000.74751.1568716800183@Atlassian.JIRA>
In-Reply-To: <JIRA.13149437.1522658846000@Atlassian.JIRA>
References: <JIRA.13149437.1522658846000@Atlassian.JIRA> <JIRA.13149437.1522658846158@jira-he-de>
Subject: [jira] [Commented] (IGNITE-8098) Getting affinity for topology
 version earlier than affinity is calculated because of data race
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394


    [ https://issues.apache.org/jira/browse/IGNITE-8098?page=3Dcom.atlassia=
n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D169=
31277#comment-16931277 ]=20

Ilya Kasnacheev commented on IGNITE-8098:
-----------------------------------------

Possible duplicate of IGNITE-11465

> Getting affinity for topology version earlier than affinity is calculated=
 because of data race
> -------------------------------------------------------------------------=
---------------------
>
>                 Key: IGNITE-8098
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8098
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.3
>            Reporter: Andrey Aleksandrov
>            Priority: Minor
>             Fix For: 2.8
>
>
> From time to time the Ignite cluster with services throws next exception =
during restarting of=C2=A0 some nodes:
> java.lang.IllegalStateException: Getting affinity for topology version ea=
rlier than affinity is calculated [locNode=3DTcpDiscoveryNode [id=3Dc770dbc=
f-2908-442d-8aa0-bf26a2aecfef, addrs=3D[10.44.162.169, 127.0.0.1], sockAddr=
s=3D[clrv0000041279.ic.ing.net/10.44.162.169:56500, /127.0.0.1:56500], disc=
Port=3D56500, order=3D11, intOrder=3D8, lastExchangeTime=3D1520931375337, l=
oc=3Dtrue, ver=3D2.3.3#20180213-sha1:f446df34, isClient=3Dfalse], grp=3Dign=
ite-sys-cache, topVer=3DAffinityTopologyVersion [topVer=3D13, minorTopVer=
=3D0], head=3DAffinityTopologyVersion [topVer=3D15, minorTopVer=3D0], histo=
ry=3D[AffinityTopologyVersion [topVer=3D11, minorTopVer=3D0], AffinityTopol=
ogyVersion [topVer=3D11, minorTopVer=3D1], AffinityTopologyVersion [topVer=
=3D12, minorTopVer=3D0], AffinityTopologyVersion [topVer=3D15, minorTopVer=
=3D0]]]
> Looks like the reason of this issue is the data race in=C2=A0GridServiceP=
rocessor class.
> How to reproduce:
> 1)To simulate data race you should update next place in source code:
> Class: GridServiceProcessor
> Method: @Override public void onEvent(final DiscoveryEvent evt, final Dis=
coCache discoCache) {
> Place:
> ....
> try {
>  svcName.set(dep.configuration().getName());
>  ctx.cache().internalCache(UTILITY_CACHE_NAME).context().affinity().
>  affinityReadyFuture(topVer).get();
> //HERE (between GET and REASSIGN) you should=C2=A0add=C2=A0Thread.sleep(1=
00) for example.
> //try {
> //Thread.sleep(100);
> //}
> //catch (InterruptedException e1) {
> //e1.printStackTrace();
> //}
> =20
>  reassign(dep, topVer);
> }
> catch (IgniteCheckedException ex) {
>  if (!(e instanceof ClusterTopologyCheckedException))
>  LT.error(log, ex, "Failed to do service reassignment (will retry): " +
>  dep.configuration().getName());
>  retries.add(dep);
> }
> ...
> 2)After that you should imitate start/shutdown iterations. For reproducin=
g I used GridServiceProcessorBatchDeploySelfTest (but timeout on future.get=
 should be increased to avoid timeout error)


--
This message was sent by Atlassian Jira
(v8.3.2#803003)