From dev-return-46156-archive-asf-public=cust-asf.ponee.io@ignite.apache.org  Wed Jun  5 09:16:11 2019
Return-Path: <dev-return-46156-archive-asf-public=cust-asf.ponee.io@ignite.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [207.244.88.153])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 8EE0018065D
	for <archive-asf-public@cust-asf.ponee.io>; Wed,  5 Jun 2019 11:16:11 +0200 (CEST)
Received: (qmail 30694 invoked by uid 500); 5 Jun 2019 09:16:10 -0000
Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:dev-help@ignite.apache.org>
List-Unsubscribe: <mailto:dev-unsubscribe@ignite.apache.org>
List-Post: <mailto:dev@ignite.apache.org>
List-Id: <dev.ignite.apache.org>
Reply-To: dev@ignite.apache.org
Delivered-To: mailing list dev@ignite.apache.org
Received: (qmail 30679 invoked by uid 99); 5 Jun 2019 09:16:10 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Jun 2019 09:16:10 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 82F43C1E01
	for <dev@ignite.apache.org>; Wed,  5 Jun 2019 09:16:09 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: -0.2
X-Spam-Level:
X-Spam-Status: No, score=-0.2 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
	SPF_PASS=-0.001] autolearn=disabled
Authentication-Results: spamd4-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=gmail.com
Received: from mx1-lw-eu.apache.org ([10.40.0.8])
	by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024)
	with ESMTP id VO80M8Gqw8Li for <dev@ignite.apache.org>;
	Wed,  5 Jun 2019 09:16:07 +0000 (UTC)
Received: from mail-yw1-f41.google.com (mail-yw1-f41.google.com [209.85.161.41])
	by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 27A325F415
	for <dev@ignite.apache.org>; Wed,  5 Jun 2019 09:16:07 +0000 (UTC)
Received: by mail-yw1-f41.google.com with SMTP id t126so93254ywf.3
        for <dev@ignite.apache.org>; Wed, 05 Jun 2019 02:16:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :content-transfer-encoding;
        bh=ZL2GHfUEVp7dstVme3c+UPDqoAff//f8RNO4tkP0HOw=;
        b=FtCSt9Cc7BRl9F6fHDEP9ohhrRa/a5XSi1Y6tYUjSuBm6U8lY7sFJ/96jrcAlGN+H5
         2SPtT7+t+pn+pXadMdV0a5w0LdwrZnC1Du+bRDkSXMQtsZUHkRnK7R21iNIkNFXVIJIw
         f/X4ea3aD/rotBkoWtypFql9rif4+jOOKGHQwOuvNt8zZrbvXdYCBa+BWU+Nnu4QtK+n
         +qVWbSN2np0TrLKSOPtYmPIwpheSMCfNjPVQ6lHvXqXjouKi5Fk3LJ0i0spEc80Jzol1
         3VPAgh/dG4tSXvVpCSm/zA7AGAxU7kafz7ts21ov6RCll8SXNA2P6BzORkYKsyCpJj2d
         a/yQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:content-transfer-encoding;
        bh=ZL2GHfUEVp7dstVme3c+UPDqoAff//f8RNO4tkP0HOw=;
        b=nta3WiIRjb6DhH47ZYIsyGx2d281eRfscL3EWVFNGqT1EUmg3kQTIfXF63wQwQZ6nW
         EewEXtuZB0TFxMmc2iVoOx3sbhEHu0gCLavmkyNVy1IgqvITm0BuHDf005mYMbyVL3ec
         LZG8gLsvQqk6ae+/V8xOZcepjkILfK/CMDPmGAyESxcxcb0hFmztmU5Je2Wxt9/FZ9Tr
         2r/VI/0LbLstocaSSUOEZ3ChzsiUTQUQjx1tRFtnC96DlcKO2Kml08KSW45Av+1RNgN8
         ayzoesg90Kk0h3pIj9uqR5RP7M8DahbkLK8xyhEUmPwt1j8MnkfFqfjnYKHjScXAxizS
         p6HA==
X-Gm-Message-State: APjAAAWCsSoObGj//fwM6QaGdxganAMtyrYmMK9r5hG6bKKWDdMK8S/W
	afbmtDC6i5ZNIB5zZwBQiKo0jMbD1mDoNnSO1AY5FXsS
X-Google-Smtp-Source: APXvYqx0JMtFvVLpz4vhh0FomCjXTMzvBSfMTfAROrcYcaAjqkgmPW1TmUUbBLXCNgqse6akKyNHc5KB84ShQOAPlOY=
X-Received: by 2002:a81:a611:: with SMTP id d17mr12038836ywh.131.1559726165725;
 Wed, 05 Jun 2019 02:16:05 -0700 (PDT)
MIME-Version: 1.0
References: <CAFtZiX1YnNrq0DiT+gVLrrkS9yuuNC=JWhHwiCGDXHnFDMDbKg@mail.gmail.com>
 <CACZe2+CRttONWyLsCbwgNxFyubQKXUrheQST1m9j1Jb0Us8Y_A@mail.gmail.com>
 <CAKnekaQSh-8o0mUHL_r+_CBr8Wjro_tcMSs9yyN0V2039gzosQ@mail.gmail.com>
 <CAFtZiX1H_T1dGgHX=_3OsnVQOfQgVOrri5z7udLwKyisU+S+Hw@mail.gmail.com>
 <CAFtZiX2Ug+V=AxdvAxAvLf6XrjGGHqCP2CUek-rS_dr7dYPO+Q@mail.gmail.com> <CADiQCW+r7egQfxr9=ourB=0pq4EjChSFoEw5bEe+Y-jYVYZEBQ@mail.gmail.com>
In-Reply-To: <CADiQCW+r7egQfxr9=ourB=0pq4EjChSFoEw5bEe+Y-jYVYZEBQ@mail.gmail.com>
From: Nikita Amelchev <nsamelchev@gmail.com>
Date: Wed, 5 Jun 2019 12:15:54 +0300
Message-ID: <CAFtZiX3JqNadVMCfnbwe3PAsgz=3b506HZ9cN1GN2o2Tte5y4g@mail.gmail.com>
Subject: Re: Lightweight version of partitions map exchange
To: dev@ignite.apache.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Maksim,

I agree with you that we should implement current issue and do not
allow lightweight PME if there are MOVING partitions in the cluster.

But now I'm investigating issue about finalizing update counters cause
it assumes that finalizing happens on exchange and all cache updates
are completed. Here we can wrong process update counters gaps and
break recently merged IGNITE-10078.

And about phase 2, correct me if I misunderstood you.
You suggest do not move primary partitions on rebalancing completing
(do not change affinity assignment)? In this case, nodes recently join
to cluster will not have primary partitions and won't get a load after
rebalancing.

=D1=87=D1=82, 30 =D0=BC=D0=B0=D1=8F 2019 =D0=B3. =D0=B2 19:55, Maxim Muzafa=
rov <maxmuzaf@gmail.com>:
>
> Igniters,
>
>
> I've looked through Nikita's changes and I think for the current issue
> [1] we should not allow the existence of MOVING partitions in the
> cluster (it must be stable) to run the lightweight PME on BLT node
> leave event occurred to achieve truly unlocked operations and here are
> my thoughts why.
>
> In general, as Nikita mentioned above, the existence of MOVING
> partitions in the cluster means that the rebalance procedure is
> currently running. It owns cache partitions locally and sends in the
> background (with additional timeout) the actual statuses of his local
> partitions to the coordinator node. So, we will always have a lag
> between local node partition states and all other cluster nodes
> partitions states. This lag can be very huge since previous
> #scheduleResendPartitions() is cancelled when a new cache group
> rebalance finished. Without the fair partition states synchronization
> (without full PME) and in case of local affinity recalculation on BLT
> node leave event, other nodes will mark such partitions LOST in most
> of the cases, which in fact are present in the cluster and saved on
> some node under checkpoint. I see that it cannot be solved by saving
> transition states of such partitions on each node.
>
> As for the case when the coordinator will calculate affinity and send
> "full map" to other nodes, I think it is better here to focus on
> designing a new lightweight PME when the rebalancing process finishes.
> =D0=A1urrently full distributed PME will occur anyway by the coordinator =
by
> sending CacheAffinityChaneMessage, but I think we can avoid it here,
> since no new MOVING or OWNING node partition states are introduced and
> all the previous mappings are still valid. We don't need a distributed
> PME if we will leave partition primaries on those nodes where they
> were, just set correct partition statuses via a light discovery
> message.
>
> So, my plan here can be:
> Phase 1. Lightweight PME on BLT node leave on a stable cluster (no
> MOVING partitions);
> Phase 2. Lightweight PME on BLT node finishes its rebalance procedure.
>
> Folks, Nikita,
> WDYT?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-9913
>
> On Fri, 24 May 2019 at 13:31, Nikita Amelchev <nsamelchev@gmail.com> wrot=
e:
> >
> > Hello, Igniters!
> >
> > I am working on the implementation of lightweight PME for the case of
> > a BLT node leave. [1]
> >
> > There is a question: whether to allow lightweight PME if the cluster
> > has MOVING partitions?
> >
> > The problems that may happen if allow:
> >  - Nodes can differently select the primary node from current OWNING ba=
ckups.
> >  - One part of nodes can mark a partition as LOST and another one as OW=
NING.
> >
> > We can take states of the partitions from the node2part map. The root
> > cause of those problems is that when rebalancing ends (get the last
> > message), it updates partition state of the local node to OWNING (and
> > schedules partitions resend). This may lead to different affinity
> > re-calculations on nodes.
> >
> > I see two solutions:
> >
> > 1. Nodes will store =E2=80=9Cmoving-owning=E2=80=9D transition of parti=
tions state
> > until the rebalancing ends. Each node will locally recalculate the
> > affinity on node left event.
> > 2. The coordinator will calculate affinity and send "full map"  to
> > nodes. In this case, nodes still should wait for topology change event
> > (to get correct topology in discovery).
> >
> > If disallow lightweight PME when the cluster has MOVING partitions -
> > there are no problems and it works fine.
> >
> > Any thoughts?
> >
> > 1. https://issues.apache.org/jira/browse/IGNITE-9913
> >
> > =D0=BF=D1=82, 29 =D0=BC=D0=B0=D1=80. 2019 =D0=B3. =D0=B2 15:00, Nikita =
Amelchev <nsamelchev@gmail.com>:
> > >
> > > Pavel,
> > > I have provided MTCGA bot status in Jira issue comments. [1]
> > >
> > > Eduard,
> > > Yes, for current implementation it will be distributed PME if
> > > in-memory caches configured.
> > >
> > > 1. https://issues.apache.org/jira/browse/IGNITE-9913
> > >
> > > =D0=BF=D1=82, 29 =D0=BC=D0=B0=D1=80. 2019 =D0=B3. =D0=B2 14:49, Eduar=
d Shangareev <eduard.shangareev@gmail.com>:
> > > >
> > > > Nikita,
> > > >
> > > > It sounds cool. But I didn't get about in-memory caches. The baseli=
ne is
> > > > not used for their affinity calculation.
> > > > So, this improvement would be switched off for them or completely (=
when
> > > > such caches are presented), wouldn't it?
> > > >
> > > > On Thu, Mar 28, 2019 at 3:14 PM Pavel Kovalenko <jokserfn@gmail.com=
> wrote:
> > > >
> > > > > Hi Nikita,
> > > > >
> > > > > Thank you for your work. This is great improvement. I'll take loo=
k on it in
> > > > > next couple of days. Could you please run TC and provide MTCGA bo=
t status
> > > > > about this change?
> > > > >
> > > > > =D1=87=D1=82, 28 =D0=BC=D0=B0=D1=80. 2019 =D0=B3. =D0=B2 14:29, N=
ikita Amelchev <nsamelchev@gmail.com>:
> > > > >
> > > > > > Hello, Igniters!
> > > > > >
> > > > > > I have implemented lightweight version of partitions map exchan=
ge for
> > > > > > the case when the baseline node leaves topology. [1]
> > > > > >
> > > > > > If partitions are assigned according to the baseline topology a=
nd
> > > > > > server node leaves there's no actual need to perform distribute=
d PME.
> > > > > > Every cluster will recalculate new affinity assignments and par=
tition
> > > > > > states locally. There is no need to wait for partitions release=
d and
> > > > > > PME will be started immediately.
> > > > > >
> > > > > > I have benchmarked duration of PME under yardstick load. PME du=
ration
> > > > > > was decreased up to 10 times and the maximum latency of transac=
tions
> > > > > > was decreased up to 4-5 times. See details in Jira issue commen=
ts. [1]
> > > > > >
> > > > > > Could some expert of PME take a look at my changes? [2]
> > > > > >
> > > > > > 1. https://issues.apache.org/jira/browse/IGNITE-9913
> > > > > > 2. https://reviews.ignite.apache.org/ignite/review/IGNT-CR-1027
> > > > > >
> > > > > > --
> > > > > > Best wishes,
> > > > > > Amelchev Nikita
> > > > > >
> > > > >
> > >
> > >
> > >
> > > --
> > > Best wishes,
> > > Amelchev Nikita
> >
> >
> >
> > --
> > Best wishes,
> > Amelchev Nikita


--=20
Best wishes,
Amelchev Nikita