ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Павлухин Иван <vololo...@gmail.com>
Subject Re: Suggestion to improve deadlock detection
Date Wed, 14 Nov 2018 15:54:52 GMT

Next part as promised. A working item for me is a deadlock detector
for MVCC transactions [1]. The message is structured in 2 parts. First
is an analysis of the current state of affairs and possible options to
go. Second is a proposed option. First part is going to be not so
short so some might prefer to skip it.

The immediate question is "why we cannot use an existing deadlock
detector?". The differences between classic and MVCC transactions
implementation is the answer. Currently a collection of IgniteTxEntry
is used for detection. But such collection is not maintained for MVCC
transactions. So, it will not work out of box.
Also it looks like that current distributed iterative approach cannot
be low latency it the worst case because of doing possibly many
network requests sequentially.
So, what options do we have? Generally we should choose between
centralized and distributed approaches. By centralized approach I mean
existence of a dedicated deadlock detector located on a single node.
In the centralized approach we can face difficulties related to
failover as a node running deadlock detector can fail. In the
distributed approach extra network messaging overhead can strike
because different nodes participating in a deadlock can start
detection independently and send redundant messages. I see some
aspects which make sense for choosing implementation. Here they are
with an approach that is better (roughly speaking) in parentheses:
* Detection latency (centralized).
* Messaging overhead (centralized).
* Failover (distributed).
And also having a park of deadlock detectors sounds not very good. I
hope that it is possible to develop a common solution suitable for
both kinds of transactions. I suggest to pilot new solution with MVCC
and then adopt it for classic transactions.

Actually I propose to start with an centralized algorithm described by
Vladimir in the beginning of the thread. I will try to outline main
points of it.
1. Single deadlock detector exists in the cluster which maintains
transaction wait-for graph (WFG).
2. Each cluster node sends and invalidates wait-for edges to the detector.
3. The detector periodically searches cycles in WFG and chooses and
aborts a victim transaction if cycle is found.

Currently I have one fundamental question. Is there a possibility of
false detected deadlocks because of concurrent WFG updates?
Of course there are many points of improvements and optimizations. But
I would like to start from discussing key points.

Please share your thoughts!

[1] https://issues.apache.org/jira/browse/IGNITE-9322
ср, 14 нояб. 2018 г. в 15:47, ipavlukhin <vololo100@gmail.com>:
> Hi Igniters,
> I would like to resume the discussion about a deadlock detector. I start
> with a motivation for a further work on a subject. As I see current
> implementation (entry point IgniteTxManager.detectDeadlock) starts a
> detection only after a transaction was timed out. In my mind it is not
> very good from a product usability standpoint. As you know, in a
> situation of deadlock some keys become non-usable for an infinite amount
> of time. Currently the only way to work around it is configuring a
> timeout, but it could be rather tricky in practice to choose a
> proper/universal value for it. So, I see the main point as:
> Ability to break deadlocks without a need to configure timeouts explicitly.
> I will return soon with some thoughts about implementation. Meanwhile,
> does anybody have in mind any other usability points which I am missing?
> Or is there any alternative approaches?
> On 2017/11/21 08:32:02, Dmitriy Setrakyan <d...@apache.org> wrote:
>  > On Mon, Nov 20, 2017 at 10:15 PM, Vladimir Ozerov <vo...@gridgain.com>>
>  > wrote:>
>  >
>  > > It doesn’t need all txes. Instead, other nodes will send info about>
>  > > suspicious txes to it from time to time.>
>  > >>
>  >
>  > I see your point, I think it might work.>
>  >

Best regards,
Ivan Pavlukhin

View raw message