From dev-return-43605-archive-asf-public=cust-asf.ponee.io@ignite.apache.org  Wed Dec 19 13:51:37 2018
Return-Path: <dev-return-43605-archive-asf-public=cust-asf.ponee.io@ignite.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 7BFE4180675
	for <archive-asf-public@cust-asf.ponee.io>; Wed, 19 Dec 2018 13:51:36 +0100 (CET)
Received: (qmail 18378 invoked by uid 500); 19 Dec 2018 12:51:35 -0000
Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:dev-help@ignite.apache.org>
List-Unsubscribe: <mailto:dev-unsubscribe@ignite.apache.org>
List-Post: <mailto:dev@ignite.apache.org>
List-Id: <dev.ignite.apache.org>
Reply-To: dev@ignite.apache.org
Delivered-To: mailing list dev@ignite.apache.org
Received: (qmail 18333 invoked by uid 99); 19 Dec 2018 12:51:34 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Dec 2018 12:51:34 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 81772180A62
	for <dev@ignite.apache.org>; Wed, 19 Dec 2018 12:51:34 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 0.153
X-Spam-Level:
X-Spam-Status: No, score=0.153 tagged_above=-999 required=6.31
	tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
	DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
	FREEMAIL_ENVFROM_END_DIGIT=0.25, FROM_EXCESS_BASE64=0.105,
	RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled
Authentication-Results: spamd3-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=gmail.com
Received: from mx1-lw-eu.apache.org ([10.40.0.8])
	by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024)
	with ESMTP id qOr1QoZLuyp5 for <dev@ignite.apache.org>;
	Wed, 19 Dec 2018 12:51:30 +0000 (UTC)
Received: from mail-ot1-f51.google.com (mail-ot1-f51.google.com [209.85.210.51])
	by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 01DC55F5B6
	for <dev@ignite.apache.org>; Wed, 19 Dec 2018 12:51:29 +0000 (UTC)
Received: by mail-ot1-f51.google.com with SMTP id w25so18936490otm.13
        for <dev@ignite.apache.org>; Wed, 19 Dec 2018 04:51:29 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :content-transfer-encoding;
        bh=L+88PCQpYRye5804cw1EMOVA8M8zYKbEs8Ngkpem26w=;
        b=H/f+hIkxamCQeE70wXSuMnl0P3qHN7+Xcwqd1CGRk4L2Ui30ukm0pqz74+jnHiHH4u
         aj2uBXlhYPWesC7ChjdTxq2WramB5EAXy9IiS2X9Sp8hteD0BjcfRKAvGjNZnD2ngsex
         /5thjULaD+xaFGse1CeAxj3gKOMRAJDk/SDEtJSyaMrR0D5+94X1nAbbehTc81b+yjf/
         Ce/avUr1wnFfGvIjj0LgqgW0HNia0HuvjK9miyDdpvmPp5sUCD7AtjPs9/i9BoHh/OYU
         11D7y6vICT4odyzxpFscIvfGLBW7meNHysANT57cLvca1ForMmkb6tM1RCEUlSskwsfF
         D11A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:content-transfer-encoding;
        bh=L+88PCQpYRye5804cw1EMOVA8M8zYKbEs8Ngkpem26w=;
        b=fFyvH8RAnQercaiThfMH6K8+nrkKEkmEjIsajGel+iflX92M6sRE2qxBQwi8AZZXYJ
         CgYZ9JN32etAN9z6krjk5XVuav9bKs5ljcJQT1rmd55MWrTNlqRW3TngxUKRvM8C9Epi
         qk9NPH4o9MG1eHjdTxpJMPyg0/9myUZEMqX7ue9faguVVeX237Jvlv/oBpwiPUzESy05
         cJ8S4OQc2/KAIdc2TNNAoQqWJI9K+NtWbgu7N5A8DaS8bNdgpzrj3DOHDToRml5QRNg3
         LZIQILQBMPhz1zNf28etaGvBQNP4VO+YjqRNOIqMUHh4vmMTFk9VaOSOYZeeN1t6b+Vw
         DG5A==
X-Gm-Message-State: AA+aEWbHPSIJNf4dqqkrn3sRAViOgZaJddAzYtHhJiKFyip+YMqgt9qa
	WD8YjdG5MsTxr/4Qw0JkiV0lfhkWQWK5JmBHzD+HPQ==
X-Google-Smtp-Source: AFSGD/XqQrEqkGZTT2c61XHtfusCILs/H+Kl5C2X8YdkjKEs868ZiPzbUff/5RTXrxr4rqwyaCLFhb1rvUrnbr0UFuI=
X-Received: by 2002:a9d:3ba5:: with SMTP id k34mr13300002otc.364.1545223888223;
 Wed, 19 Dec 2018 04:51:28 -0800 (PST)
MIME-Version: 1.0
References: <CA+0=VoXSsNf1_yP_HwJZq5iNSwHrNCN2cxR+U_vYNUqHkdRYdA@mail.gmail.com>
 <81725235-4950-0c5f-be00-0ede5830fe25@gmail.com> <CAOykqKcgsHFPem_6g=bfxr3aAQA_CKBrWFhwLfZfycvJZ=GzMg@mail.gmail.com>
 <CAGt6o8kmiUW-Sr90JPHTSZPk3EyhBgrkxS_H5b1rGBj2gK1DLQ@mail.gmail.com>
 <CAOykqKcKYCADeKDNRAZ_G=qKYK5c2MyqXkZ0A4PmgV_xWR+eQQ@mail.gmail.com>
 <CAGt6o8kWgrwdiFFmO824jVz6MWeg62-J_iEtPXMPCMVJh58gGg@mail.gmail.com>
 <CAOykqKdtY_h--q+zQsh_aucRocHWks38TkuSuhZPWnto-zmsvw@mail.gmail.com>
 <CAGt6o8=BJy+LV-=Qp4fYHKdwRj01L3W-tN1CJYB4VdBXy8gejA@mail.gmail.com>
 <CAOykqKc51rq8=gy-86aRbcrx_v=gx=OzOJdWekqP5J=02+rQ6A@mail.gmail.com>
 <CAOykqKf=NL5X-Bia6m2EgpQ=SrmG=CJtbTPMkL_o73NDmmrYyw@mail.gmail.com> <CAAoNTuF80jjhMk9eCToVzakiLms8-z=j3K4MXqpqVDRq+MUWnw@mail.gmail.com>
In-Reply-To: <CAAoNTuF80jjhMk9eCToVzakiLms8-z=j3K4MXqpqVDRq+MUWnw@mail.gmail.com>
From: =?UTF-8?B?0J/QsNCy0LvRg9GF0LjQvSDQmNCy0LDQvQ==?= <vololo100@gmail.com>
Date: Wed, 19 Dec 2018 15:51:16 +0300
Message-ID: <CAOykqKeWuM3RSJnBgG+KvjWERiKqcAbtMTOtLxpv6NevYuuN5Q@mail.gmail.com>
Subject: Re: Suggestion to improve deadlock detection
To: dev@ignite.apache.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Igor,

I see your points. And I agree to start with "lightweight
implementation". Today we have 2PL and there is no activity on
implementing rollback to savepoint. And if we do it in the future we
will have to return to the subject of deadlock detection anyway.

I will proceed with "forward-only" approach.
=D0=B2=D1=82, 18 =D0=B4=D0=B5=D0=BA. 2018 =D0=B3. =D0=B2 18:33, Seliverstov=
 Igor <gvvinblade@gmail.com>:
>
> Ivan,
>
> I would prefer forward-only implementation even knowing it allows false
> positive check results.
>
> Why I think so:
>
> 1) From my experience, when we have any future is waiting for reply, we
> have to take a failover into consideration.
> Usually failover implementations are way more complex than an initial
> algorithm itself.
> Forward-only version doesn't need any failover implementation as it expec=
ts
> failed probes (the probes didn't face a deadlock ).
>
> 2) Simple lightweight feature implementation is a really good point to
> start - it may be extended if needed but really often such extending
> doesn't need at all.
>
> 3) Any implementation allow false positive result - we are working with
> distributed system, there are a bunch of processes happens at the same ti=
me
> and,
> for example, concurrently happened rollback on timeout may solve a deadlo=
ck
> but probe is finished at this time, so- there is a rollback on deadlock
> also as a result.
>
> 4) The described case (when false positive result is probable) isn't usua=
l
> but, potentially, extremely rare, I don't think we should solve it since =
it
> doesn't break the grid.
>
> Regards,
> Igor
>
> =D0=B2=D1=82, 18 =D0=B4=D0=B5=D0=BA. 2018 =D0=B3. =D0=B2 17:57, =D0=9F=D0=
=B0=D0=B2=D0=BB=D1=83=D1=85=D0=B8=D0=BD =D0=98=D0=B2=D0=B0=D0=BD <vololo100=
@gmail.com>:
>
> > Hi folks,
> >
> > During implementation of edge-chasing deadlock detection algorithm in
> > scope of [1] it has been realized that basically we have 2 options for
> > "chasing" strategy. I will use terms Near when GridNearTxLocal is
> > assumed and Dht when GridDhtTxLocal (tx which updates primary
> > partition). So, here is 2 mentioned strategies:
> > 1. Send initial probe when Dht tx blocks waiting for another tx to
> > release a key. Initial probe is sent from waiting Dht tx to Near tx
> > holding a lock. If receiving Near tx is blocked as well then it relays
> > the probe to Dht tx it awaits response from. So, the probe is
> > traveling like Dht -> Near -> Dht -> Near -> ... Let's call the
> > approach "forward-only".
> > 2. Send probes only between Near transactions. This approach requires
> > additional request-response call which Near tx issues to Dht node to
> > check if tx sending a request is waiting for another tx. The call
> > returns identifier of a Near tx blocking tx sending a request (if
> > sending tx is in fact blocked). Then waiting Near tx relays a probe to
> > blocked Near tx. Let's call that approach "lock-checking".
> >
> > I would like to define several points to consider while comparing
> > approaches (please point out more if I miss something):
> > 1. Correctness. Especially regarding reporting false deadlocks.
> > 2. Extensibility. Rollback to savepoint and generalization for classic
> > transactions should be considered.
> > 3. Messaging overhead.
> > 4. Simplicity of implementation.
> >
> > You can find an implementation of "lock-checking" approach in PR
> > attached to the ticket [1]. Currently it works correctly only for
> > transactions obeying strict two-phase locking (2PL). It is fairly easy
> > to adopt PR to implement "forward-only" approach. Which will work
> > correct in conditions of 2PL. "lock-checking" algorithm uses 1.5 times
> > more messages than "forward-only". Implementation of "forward-only"
> > algorithm is simpler in a sense that it does not require introducing
> > lock-wait-check messages and future. But as for me it does not look as
> > big deal. I think that implementation efforts for both approaches are
> > almost equal so far.
> >
> > But corner stone here is an extensibility. Let's imagine that we are
> > going to implement rollback to savepoint. One suggested approach to
> > extend "forward-only" approach for that case is invalidating sent
> > probe. Basically, a tx initiated deadlock check assigns unique id to a
> > probe. And this probe id is invalidated when the tx wakes up from
> > wait. If that tx receives back a probe with invalidated id it simply
> > discards it. If id is not invalidated it means a deadlock. But false
> > deadlock still can be detected. I will provide an example when it does
> > not work correctly.
> >
> > Please note, that our transactions can request multiple locks
> > simultaneously (so-called AND model). Also rollback to savepoint
> > releases locks obtained after creating a savepoint. Let's use
> > following notation. "*" means savepoint. "^" means rollback to
> > previous savepoint. "!" means acquiring a lock. "?" means waiting for
> > a lock. "&" means requesting multiple locks simultaneously. See the
> > following transaction execution scenario for 3 transactions and 3
> > keys:
> > TA        | TB     |   TC
> >             |          |   *
> >             |  1!      |  2!
> > 1? & 3! |           |
> >             |  2?     |
> >             |          |  ^
> >             |          |  3!
> >
> > We can notice that suggested scenario does not introduce deadlock
> > because locks are requested in consistent order among all
> > transactions. But in different moments there exist waiting relations
> > TA w TB, TB w TC, TC w TA. So, we suspect a false deadlock could be
> > detected here. Also one can notice that a relation TC w TA is
> > established after TB w TC is destroyed. Messages on a timeline diagram
> > demonstrates how "forward-only with probe invalidation" approach can
> > detect a false deadlock here [2]. On the picture L means "lock", U
> > "unlock", W "wait". Red cross means reporting a false deadlock.
> > Intuition here is that while probe is in flight one wait-for edge can
> > be destroyed (TB w TC) and another one can be created (TC w TA). So,
> > the approach can report false deadlocks. Also, note that a false
> > deadlock can be found even when locking order is consistent!
> >
> > After a while I will describe how "lock-checking" can be adopted to
> > support rollback to savepoint. Disclaimer here is that it will require
> > even more messages.
> > ----
> >
> > I think that we can already start a discussion.
> > Actually, while false deadlocks can be surprising we perhaps can
> > tolerate it because our transaction implementation assumes that
> > transactions can be rolled back by system. "forward-only" approach
> > requires lesser messages but in big-O terms both approaches have the
> > same complexity. Both correctness and complexity impact unfortunately
> > have hardly predictable impact for real workflows. In ideal world
> > thorough testing could give the answers.
> >
> > Please, share your vision.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-9322
> > [2]
> > https://gist.githubusercontent.com/pavlukhin/c8c7c6266eeab56048c31f5cdf=
b31d20/raw/7d1aef9abcd014ec9fdf69840168768ced638b6c/msg_diagram.jpg
> > =D0=BF=D0=BD, 26 =D0=BD=D0=BE=D1=8F=D0=B1. 2018 =D0=B3. =D0=B2 15:27, =
=D0=9F=D0=B0=D0=B2=D0=BB=D1=83=D1=85=D0=B8=D0=BD =D0=98=D0=B2=D0=B0=D0=BD <=
vololo100@gmail.com>:
> > >
> > > Vladimir,
> > >
> > > I think it might work. So, if nobody minds I can start prototyping
> > > edge-chasing approach.
> > > =D0=BF=D0=BD, 26 =D0=BD=D0=BE=D1=8F=D0=B1. 2018 =D0=B3. =D0=B2 14:32,=
 Vladimir Ozerov <vozerov@gridgain.com>:
> > > >
> > > > Ivan,
> > > >
> > > > The problem is that in our system a transaction may wait for N lock=
s
> > > > simultaneously. This may form complex graphs which spread between m=
any
> > > > nodes. Now consider that I have a deadlock between 4 nodes: A -> B =
->
> > *C*
> > > > -> D -> A. I've sent a message from a and never reached D because C
> > failed.
> > > > Well, may be that is good, because some transactions will be rolled
> > back.
> > > > But when they are rolled back, another cycles from pending multi-wa=
y
> > > > deadlocks may form again. E.g. A -> B -> *E* -> D -> A. So the
> > question is
> > > > whether we will be able to detect them reliably. I think that we ma=
y
> > use
> > > > the following assumption:
> > > > 1) If data node fails, relevant transactions will be rolled-back
> > > > 2) It means that some other transactions will make at least partial
> > > > progress with locks acquisition
> > > >
> > > > So, we can introduce a kind of counter which will be advanced on ev=
ery
> > > > acquired lock on a given node. And define a rule that transaction
> > deadlock
> > > > request for the given pair (NODE_ID, COUNTER) should be requested a=
t
> > most
> > > > once. This way, even if specific deadlock check request is lost due=
 to
> > node
> > > > failure, and another deadlock has formed, then some other node will
> > > > re-trigger deadlock change request sooner or later, as it's counter
> > > > advanced.
> > > >
> > > > Makes sense?
> > > >
> > > > On Sat, Nov 24, 2018 at 5:40 PM =D0=9F=D0=B0=D0=B2=D0=BB=D1=83=D1=
=85=D0=B8=D0=BD =D0=98=D0=B2=D0=B0=D0=BD <vololo100@gmail.com>
> > wrote:
> > > >
> > > > > Hi Vladimir,
> > > > >
> > > > > Regarding fault tolerance. It seems that it is not a problem for
> > > > > edge-chasing approaches. A found deadlock is identified by a mess=
age
> > > > > returned to a detection initiator with initiator's identifier. If
> > > > > there is no deadlock then such message will not come. If some nod=
e
> > > > > containing a deadlocked transaction fails then it will break the
> > > > > deadlock. Am I missing something?
> > > > >
> > > > > About messaging overhead. Indeed, it looks like edge-chasing
> > > > > approaches can bring redundant messages. Perhaps, we can borrow s=
ome
> > > > > ideas about optimization from [1]. And I also think about a timeo=
ut
> > > > > before starting a deadlock detection.
> > > > >
> > > > > Thoughts about adaptive timeouts lead me to some thoughts about
> > > > > monitoring. I assume that long waiting for locks signals about no=
t
> > > > > optimal performance of the system. I think it would be great to h=
ave
> > > > > means of monitoring transactions contention. It could help users =
to
> > > > > improve theirs workloads. It also could help us to build some
> > > > > reasoning how contention correlates with deadlock detection overh=
ead.
> > > > >
> > > > > [1] http://mentallandscape.com/Papers_podc84.pdf
> > > > > =D0=B2=D1=81, 18 =D0=BD=D0=BE=D1=8F=D0=B1. 2018 =D0=B3. =D0=B2 10=
:52, Vladimir Ozerov <vozerov@gridgain.com
> > >:
> > > > > >
> > > > > > Hi Ivan,
> > > > > >
> > > > > > Great analysis. Agree that edge-chasing looks like better
> > candidate.
> > > > > First,
> > > > > > it will be applicable to both normal and MVCC transactions.
> > Second, in
> > > > > MVCC
> > > > > > we probably will also need to release some locks when doing
> > rollbacks.
> > > > > What
> > > > > > we should think about is failover - what if a node which was in=
 the
> > > > > middle
> > > > > > of a graph fails? We need to craft fault-tolerance carefully.
> > > > > >
> > > > > > Another critically important point is how to trigger deadlock
> > detection.
> > > > > My
> > > > > > concerns about edge-chasing was not about latency, but about a
> > number of
> > > > > > messages which travels between nodes. Good algorithm must produ=
ce
> > little
> > > > > to
> > > > > > no messages in case of normal contenion while still providing
> > protection
> > > > > in
> > > > > > case of real deadlocks. So how would we trigger fraph traversal
> > for the
> > > > > > given transaction? May be we can start with hard timeout and th=
en
> > employ
> > > > > a
> > > > > > kind of adaptive increment in case high rate of false-positives=
 are
> > > > > > observed. May be something else.
> > > > > >
> > > > > > On Sun, Nov 18, 2018 at 10:21 AM =D0=9F=D0=B0=D0=B2=D0=BB=D1=83=
=D1=85=D0=B8=D0=BD =D0=98=D0=B2=D0=B0=D0=BD <
> > vololo100@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Vladimir,
> > > > > > >
> > > > > > > Thanks for the articles! I studied them and a couple of other=
s.
> > And I
> > > > > > > would like to share a knowledge I found.
> > > > > > >
> > > > > > > BACKGROUND
> > > > > > > First of all our algorithm implemented in
> > > > > > >
> > > > > > >
> > > > >
> > org.apache.ignite.internal.processors.cache.transactions.TxDeadlockDete=
ction
> > > > > > > is not an edge-chasing algorithm. In essence a lock-waiting
> > > > > > > transaction site polls nodes responsible for keys of interest
> > one by
> > > > > > > one and reconstructs global wait-for graph (WFG) locally.
> > > > > > > A centralized algorithm discussed in this thread looks simila=
r to
> > > > > > > algorithms proposed by Ho [1]. The simplest of them (two-phas=
ed)
> > > > > > > reports false deadlocks when unlock before transaction finish=
 is
> > > > > > > permitted. So, it seems that it only works when strict two-ph=
ase
> > > > > > > locking (2PL) is obeyed. Another one (called one-phased) requ=
ires
> > > > > > > tracking all locked keys by each transaction which is not
> > desirable
> > > > > > > for MVCC transactions.
> > > > > > > Aforementioned edge-chasing algorithm by Chandy, Misra and Ha=
as
> > [2] is
> > > > > > > proven to work even when 2PL is not obeyed.
> > > > > > > Also performance target is not clear for me. It looks like
> > centralized
> > > > > > > approaches can use fewer messages and provide lower latency t=
hat
> > > > > > > distributed ones. But would like to understand what are the
> > orders of
> > > > > > > latency and messaging overhead. Many of algorithms are descri=
bed
> > in
> > > > > > > [3] and some performance details are mentioned. It is said "A=
s
> > per
> > > > > > > [GRAY81], most deadlocks involve two to four transactions." I
> > see one
> > > > > > > more argument making deadlock detection algorithm latency not=
 so
> > > > > > > important. We can consider deadlock as unavoidable situation,
> > but even
> > > > > > > lightning fast detector will not save a database from perform=
ance
> > > > > > > degradation in case of tons of deadlocks.
> > > > > > > What is also interesting, Google Cloud Spanner employs simple
> > > > > > > "wound-wait" deadlock prevention [4] instead of any detection
> > > > > > > algorithm.
> > > > > > >
> > > > > > > DISCUSSION
> > > > > > > So, I see the following options:
> > > > > > > 1. Edge-chasing algorithm like Chandy, Misra and Haas [2].
> > > > > > > 2. Centralized two-phased algorithm described by Ho [1] with
> > > > > > > restriction that transactions must obey 2PL.
> > > > > > > Personally, I tend to edge-chasing approach because it looks
> > simple
> > > > > > > and universal. Restricting ourselves with 2PL will restrict u=
s in
> > > > > > > features which could be implemented in future (e.g. transacti=
on
> > > > > > > savepoints), will not it?
> > > > > > >
> > > > > > > WDYT?
> > > > > > >
> > > > > > > Ivan
> > > > > > >
> > > > > > > [1]
> > https://turing.cs.hbg.psu.edu/comp512.papers/HoRamamoorthy-82.pdf
> > > > > > > [2]
> > > > > > >
> > > > >
> > https://www.cs.utexas.edu/users/misra/scannedPdf.dir/DistrDeadlockDetec=
tion.pdf
> > > > > > > [3] https://www.ics.uci.edu/~cs237/reading/p37-elmagarmid.pdf
> > > > > > > [4]
> > > > > > >
> > > > >
> > https://cloud.google.com/spanner/docs/transactions#rw_transaction_perfo=
rmance
> > > > > > >
> > > > > > > =D1=81=D1=80, 14 =D0=BD=D0=BE=D1=8F=D0=B1. 2018 =D0=B3. =D0=
=B2 22:13, Vladimir Ozerov <
> > vozerov@gridgain.com>:
> > > > > > > >
> > > > > > > > Ivan,
> > > > > > > >
> > > > > > > > This is interesting question. I think we should spend some
> > time for
> > > > > > > formal
> > > > > > > > verification whether this algorithm works or not. Several
> > articles
> > > > > you
> > > > > > > may
> > > > > > > > use as a startitng point: [1], [2]. From what I understand,
> > Ignite
> > > > > fall
> > > > > > > > into "AND" model, and currently implemented algorithm is a
> > variation
> > > > > of
> > > > > > > > "edge-chasing" approach as per Chandy, Misra and Haas [3],
> > which is
> > > > > > > *proven
> > > > > > > > to be correct* in that it both detects deadlocks when they =
are
> > > > > present,
> > > > > > > and
> > > > > > > > do not produce false positives. But is might be too heavy f=
or
> > the
> > > > > system
> > > > > > > > under contention.
> > > > > > > >
> > > > > > > > We need to search for any formal proof of correctness of
> > proposed
> > > > > > > > algorithm. This area is already researched throughly enough=
,
> > so we
> > > > > should
> > > > > > > > be able to find an answer quickly.
> > > > > > > >
> > > > > > > > Vladimir.
> > > > > > > >
> > > > > > > > [1] http://www.cse.scu.edu/~jholliday/dd_9_16.htm
> > > > > > > > [2] https://www.cs.uic.edu/~ajayk/Chapter10.pdf
> > > > > > > > [3]
> > > > > > > >
> > > > > > >
> > > > >
> > https://www.cs.utexas.edu/users/misra/scannedPdf.dir/DistrDeadlockDetec=
tion.pdf
> > > > > > > >
> > > > > > > > On Wed, Nov 14, 2018 at 6:55 PM =D0=9F=D0=B0=D0=B2=D0=BB=D1=
=83=D1=85=D0=B8=D0=BD =D0=98=D0=B2=D0=B0=D0=BD <
> > vololo100@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > Next part as promised. A working item for me is a deadloc=
k
> > detector
> > > > > > > > > for MVCC transactions [1]. The message is structured in 2
> > parts.
> > > > > First
> > > > > > > > > is an analysis of the current state of affairs and possib=
le
> > > > > options to
> > > > > > > > > go. Second is a proposed option. First part is going to b=
e
> > not so
> > > > > > > > > short so some might prefer to skip it.
> > > > > > > > >
> > > > > > > > > ANALYSIS
> > > > > > > > > The immediate question is "why we cannot use an existing
> > deadlock
> > > > > > > > > detector?". The differences between classic and MVCC
> > transactions
> > > > > > > > > implementation is the answer. Currently a collection of
> > > > > IgniteTxEntry
> > > > > > > > > is used for detection. But such collection is not maintai=
ned
> > for
> > > > > MVCC
> > > > > > > > > transactions. So, it will not work out of box.
> > > > > > > > > Also it looks like that current distributed iterative
> > approach
> > > > > cannot
> > > > > > > > > be low latency it the worst case because of doing possibl=
y
> > many
> > > > > > > > > network requests sequentially.
> > > > > > > > > So, what options do we have? Generally we should choose
> > between
> > > > > > > > > centralized and distributed approaches. By centralized
> > approach I
> > > > > mean
> > > > > > > > > existence of a dedicated deadlock detector located on a
> > single
> > > > > node.
> > > > > > > > > In the centralized approach we can face difficulties rela=
ted
> > to
> > > > > > > > > failover as a node running deadlock detector can fail. In=
 the
> > > > > > > > > distributed approach extra network messaging overhead can
> > strike
> > > > > > > > > because different nodes participating in a deadlock can s=
tart
> > > > > > > > > detection independently and send redundant messages. I se=
e
> > some
> > > > > > > > > aspects which make sense for choosing implementation. Her=
e
> > they are
> > > > > > > > > with an approach that is better (roughly speaking) in
> > parentheses:
> > > > > > > > > * Detection latency (centralized).
> > > > > > > > > * Messaging overhead (centralized).
> > > > > > > > > * Failover (distributed).
> > > > > > > > > And also having a park of deadlock detectors sounds not v=
ery
> > good.
> > > > > I
> > > > > > > > > hope that it is possible to develop a common solution
> > suitable for
> > > > > > > > > both kinds of transactions. I suggest to pilot new soluti=
on
> > with
> > > > > MVCC
> > > > > > > > > and then adopt it for classic transactions.
> > > > > > > > >
> > > > > > > > > PROPOSAL
> > > > > > > > > Actually I propose to start with an centralized algorithm
> > > > > described by
> > > > > > > > > Vladimir in the beginning of the thread. I will try to
> > outline main
> > > > > > > > > points of it.
> > > > > > > > > 1. Single deadlock detector exists in the cluster which
> > maintains
> > > > > > > > > transaction wait-for graph (WFG).
> > > > > > > > > 2. Each cluster node sends and invalidates wait-for edges=
 to
> > the
> > > > > > > detector.
> > > > > > > > > 3. The detector periodically searches cycles in WFG and
> > chooses and
> > > > > > > > > aborts a victim transaction if cycle is found.
> > > > > > > > >
> > > > > > > > > Currently I have one fundamental question. Is there a
> > possibility
> > > > > of
> > > > > > > > > false detected deadlocks because of concurrent WFG update=
s?
> > > > > > > > > Of course there are many points of improvements and
> > optimizations.
> > > > > But
> > > > > > > > > I would like to start from discussing key points.
> > > > > > > > >
> > > > > > > > > Please share your thoughts!
> > > > > > > > >
> > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-9322
> > > > > > > > > =D1=81=D1=80, 14 =D0=BD=D0=BE=D1=8F=D0=B1. 2018 =D0=B3. =
=D0=B2 15:47, ipavlukhin <
> > vololo100@gmail.com>:
> > > > > > > > > >
> > > > > > > > > > Hi Igniters,
> > > > > > > > > >
> > > > > > > > > > I would like to resume the discussion about a deadlock
> > detector.
> > > > > I
> > > > > > > start
> > > > > > > > > > with a motivation for a further work on a subject. As I=
 see
> > > > > current
> > > > > > > > > > implementation (entry point IgniteTxManager.detectDeadl=
ock)
> > > > > starts a
> > > > > > > > > > detection only after a transaction was timed out. In my
> > mind it
> > > > > is
> > > > > > > not
> > > > > > > > > > very good from a product usability standpoint. As you
> > know, in a
> > > > > > > > > > situation of deadlock some keys become non-usable for a=
n
> > infinite
> > > > > > > amount
> > > > > > > > > > of time. Currently the only way to work around it is
> > configuring
> > > > > a
> > > > > > > > > > timeout, but it could be rather tricky in practice to
> > choose a
> > > > > > > > > > proper/universal value for it. So, I see the main point=
 as:
> > > > > > > > > >
> > > > > > > > > > Ability to break deadlocks without a need to configure
> > timeouts
> > > > > > > > > explicitly.
> > > > > > > > > >
> > > > > > > > > > I will return soon with some thoughts about implementat=
ion.
> > > > > > > Meanwhile,
> > > > > > > > > > does anybody have in mind any other usability points wh=
ich
> > I am
> > > > > > > missing?
> > > > > > > > > > Or is there any alternative approaches?
> > > > > > > > > >
> > > > > > > > > > On 2017/11/21 08:32:02, Dmitriy Setrakyan <d...@apache.=
org
> > >
> > > > > wrote:
> > > > > > > > > >  > On Mon, Nov 20, 2017 at 10:15 PM, Vladimir Ozerov <
> > > > > > > vo...@gridgain.com
> > > > > > > > > >>
> > > > > > > > > >  > wrote:>
> > > > > > > > > >  >
> > > > > > > > > >  > > It doesn=E2=80=99t need all txes. Instead, other n=
odes will
> > send
> > > > > info
> > > > > > > about>
> > > > > > > > > >  > > suspicious txes to it from time to time.>
> > > > > > > > > >  > >>
> > > > > > > > > >  >
> > > > > > > > > >  > I see your point, I think it might work.>
> > > > > > > > > >  >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best regards,
> > > > > > > > > Ivan Pavlukhin
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best regards,
> > > > > > > Ivan Pavlukhin
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Ivan Pavlukhin
> > > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Ivan Pavlukhin
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
> >


--=20
Best regards,
Ivan Pavlukhin