ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Seliverstov Igor <gvvinbl...@gmail.com>
Subject Re: Suggestion to improve deadlock detection
Date Mon, 24 Dec 2018 11:56:49 GMT
Ivan,

We have to design configuration carefully.
There are possible issues with compatibility and this API is public, it
might be difficult to redesign it after while.

Since there is a ticket about MVCC related configuration parts [1] (TxLog
memory region/size, vacuum thread count and intervals, etc)
I suggest solving the issue in scope of that ticket.

At now, let's introduce a couple of system variables for an ability to
change the behavior while development phase.

[1] https://issues.apache.org/jira/browse/IGNITE-7966

Regards,
Igor

чт, 20 дек. 2018 г. в 16:29, Павлухин Иван <vololo100@gmail.com>:

> Hi,
>
> I prepared a patch with deadlock detection algorithm implementation
> (you can find it from ticket [1]).
>
> Now I would like to discuss options for configuring deadlock
> detection. From a usability standpoint following need to be supported:
> 1. Turn deadlock detection on/off (on by default).
> 2. Configure a delay between starting waiting for a lock and sending
> first deadlock detection message (e.g. 1 second by default).
> 3. Something else?
>
> I can suggest following options:
> 1. Use a single long for configuration. Values < 0 means disabled
> deadlock detection. 0 means starting detection without a delay. Values
> > 0 specifies a delay in milliseconds before starting detection.
> 2. Use a boolean flag for turning detection on/off. Use a long value
> for configuring delay.
>
> Also there are number of ways to pass this values to Ignite. I do no
> have much experience in adding new configuration options to Ignite and
> I need an advice here. For example, it could be done by adding
> additional properties to TransactionConfiguration class. Another way
> is using java system properties.
>
> Igor, Vladimir and others what do you think?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-9322
> ср, 19 дек. 2018 г. в 15:51, Павлухин Иван <vololo100@gmail.com>:
> >
> > Igor,
> >
> > I see your points. And I agree to start with "lightweight
> > implementation". Today we have 2PL and there is no activity on
> > implementing rollback to savepoint. And if we do it in the future we
> > will have to return to the subject of deadlock detection anyway.
> >
> > I will proceed with "forward-only" approach.
> > вт, 18 дек. 2018 г. в 18:33, Seliverstov Igor <gvvinblade@gmail.com>:
> > >
> > > Ivan,
> > >
> > > I would prefer forward-only implementation even knowing it allows false
> > > positive check results.
> > >
> > > Why I think so:
> > >
> > > 1) From my experience, when we have any future is waiting for reply, we
> > > have to take a failover into consideration.
> > > Usually failover implementations are way more complex than an initial
> > > algorithm itself.
> > > Forward-only version doesn't need any failover implementation as it
> expects
> > > failed probes (the probes didn't face a deadlock ).
> > >
> > > 2) Simple lightweight feature implementation is a really good point to
> > > start - it may be extended if needed but really often such extending
> > > doesn't need at all.
> > >
> > > 3) Any implementation allow false positive result - we are working with
> > > distributed system, there are a bunch of processes happens at the same
> time
> > > and,
> > > for example, concurrently happened rollback on timeout may solve a
> deadlock
> > > but probe is finished at this time, so- there is a rollback on deadlock
> > > also as a result.
> > >
> > > 4) The described case (when false positive result is probable) isn't
> usual
> > > but, potentially, extremely rare, I don't think we should solve it
> since it
> > > doesn't break the grid.
> > >
> > > Regards,
> > > Igor
> > >
> > > вт, 18 дек. 2018 г. в 17:57, Павлухин Иван <vololo100@gmail.com>:
> > >
> > > > Hi folks,
> > > >
> > > > During implementation of edge-chasing deadlock detection algorithm in
> > > > scope of [1] it has been realized that basically we have 2 options
> for
> > > > "chasing" strategy. I will use terms Near when GridNearTxLocal is
> > > > assumed and Dht when GridDhtTxLocal (tx which updates primary
> > > > partition). So, here is 2 mentioned strategies:
> > > > 1. Send initial probe when Dht tx blocks waiting for another tx to
> > > > release a key. Initial probe is sent from waiting Dht tx to Near tx
> > > > holding a lock. If receiving Near tx is blocked as well then it
> relays
> > > > the probe to Dht tx it awaits response from. So, the probe is
> > > > traveling like Dht -> Near -> Dht -> Near -> ... Let's call
the
> > > > approach "forward-only".
> > > > 2. Send probes only between Near transactions. This approach requires
> > > > additional request-response call which Near tx issues to Dht node to
> > > > check if tx sending a request is waiting for another tx. The call
> > > > returns identifier of a Near tx blocking tx sending a request (if
> > > > sending tx is in fact blocked). Then waiting Near tx relays a probe
> to
> > > > blocked Near tx. Let's call that approach "lock-checking".
> > > >
> > > > I would like to define several points to consider while comparing
> > > > approaches (please point out more if I miss something):
> > > > 1. Correctness. Especially regarding reporting false deadlocks.
> > > > 2. Extensibility. Rollback to savepoint and generalization for
> classic
> > > > transactions should be considered.
> > > > 3. Messaging overhead.
> > > > 4. Simplicity of implementation.
> > > >
> > > > You can find an implementation of "lock-checking" approach in PR
> > > > attached to the ticket [1]. Currently it works correctly only for
> > > > transactions obeying strict two-phase locking (2PL). It is fairly
> easy
> > > > to adopt PR to implement "forward-only" approach. Which will work
> > > > correct in conditions of 2PL. "lock-checking" algorithm uses 1.5
> times
> > > > more messages than "forward-only". Implementation of "forward-only"
> > > > algorithm is simpler in a sense that it does not require introducing
> > > > lock-wait-check messages and future. But as for me it does not look
> as
> > > > big deal. I think that implementation efforts for both approaches are
> > > > almost equal so far.
> > > >
> > > > But corner stone here is an extensibility. Let's imagine that we are
> > > > going to implement rollback to savepoint. One suggested approach to
> > > > extend "forward-only" approach for that case is invalidating sent
> > > > probe. Basically, a tx initiated deadlock check assigns unique id to
> a
> > > > probe. And this probe id is invalidated when the tx wakes up from
> > > > wait. If that tx receives back a probe with invalidated id it simply
> > > > discards it. If id is not invalidated it means a deadlock. But false
> > > > deadlock still can be detected. I will provide an example when it
> does
> > > > not work correctly.
> > > >
> > > > Please note, that our transactions can request multiple locks
> > > > simultaneously (so-called AND model). Also rollback to savepoint
> > > > releases locks obtained after creating a savepoint. Let's use
> > > > following notation. "*" means savepoint. "^" means rollback to
> > > > previous savepoint. "!" means acquiring a lock. "?" means waiting for
> > > > a lock. "&" means requesting multiple locks simultaneously. See the
> > > > following transaction execution scenario for 3 transactions and 3
> > > > keys:
> > > > TA        | TB     |   TC
> > > >             |          |   *
> > > >             |  1!      |  2!
> > > > 1? & 3! |           |
> > > >             |  2?     |
> > > >             |          |  ^
> > > >             |          |  3!
> > > >
> > > > We can notice that suggested scenario does not introduce deadlock
> > > > because locks are requested in consistent order among all
> > > > transactions. But in different moments there exist waiting relations
> > > > TA w TB, TB w TC, TC w TA. So, we suspect a false deadlock could be
> > > > detected here. Also one can notice that a relation TC w TA is
> > > > established after TB w TC is destroyed. Messages on a timeline
> diagram
> > > > demonstrates how "forward-only with probe invalidation" approach can
> > > > detect a false deadlock here [2]. On the picture L means "lock", U
> > > > "unlock", W "wait". Red cross means reporting a false deadlock.
> > > > Intuition here is that while probe is in flight one wait-for edge can
> > > > be destroyed (TB w TC) and another one can be created (TC w TA). So,
> > > > the approach can report false deadlocks. Also, note that a false
> > > > deadlock can be found even when locking order is consistent!
> > > >
> > > > After a while I will describe how "lock-checking" can be adopted to
> > > > support rollback to savepoint. Disclaimer here is that it will
> require
> > > > even more messages.
> > > > ----
> > > >
> > > > I think that we can already start a discussion.
> > > > Actually, while false deadlocks can be surprising we perhaps can
> > > > tolerate it because our transaction implementation assumes that
> > > > transactions can be rolled back by system. "forward-only" approach
> > > > requires lesser messages but in big-O terms both approaches have the
> > > > same complexity. Both correctness and complexity impact unfortunately
> > > > have hardly predictable impact for real workflows. In ideal world
> > > > thorough testing could give the answers.
> > > >
> > > > Please, share your vision.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-9322
> > > > [2]
> > > >
> https://gist.githubusercontent.com/pavlukhin/c8c7c6266eeab56048c31f5cdfb31d20/raw/7d1aef9abcd014ec9fdf69840168768ced638b6c/msg_diagram.jpg
> > > > пн, 26 нояб. 2018 г. в 15:27, Павлухин Иван <vololo100@gmail.com>:
> > > > >
> > > > > Vladimir,
> > > > >
> > > > > I think it might work. So, if nobody minds I can start prototyping
> > > > > edge-chasing approach.
> > > > > пн, 26 нояб. 2018 г. в 14:32, Vladimir Ozerov <
> vozerov@gridgain.com>:
> > > > > >
> > > > > > Ivan,
> > > > > >
> > > > > > The problem is that in our system a transaction may wait for
N
> locks
> > > > > > simultaneously. This may form complex graphs which spread
> between many
> > > > > > nodes. Now consider that I have a deadlock between 4 nodes:
A ->
> B ->
> > > > *C*
> > > > > > -> D -> A. I've sent a message from a and never reached
D
> because C
> > > > failed.
> > > > > > Well, may be that is good, because some transactions will be
> rolled
> > > > back.
> > > > > > But when they are rolled back, another cycles from pending
> multi-way
> > > > > > deadlocks may form again. E.g. A -> B -> *E* -> D ->
A. So the
> > > > question is
> > > > > > whether we will be able to detect them reliably. I think that
we
> may
> > > > use
> > > > > > the following assumption:
> > > > > > 1) If data node fails, relevant transactions will be rolled-back
> > > > > > 2) It means that some other transactions will make at least
> partial
> > > > > > progress with locks acquisition
> > > > > >
> > > > > > So, we can introduce a kind of counter which will be advanced
on
> every
> > > > > > acquired lock on a given node. And define a rule that transaction
> > > > deadlock
> > > > > > request for the given pair (NODE_ID, COUNTER) should be
> requested at
> > > > most
> > > > > > once. This way, even if specific deadlock check request is lost
> due to
> > > > node
> > > > > > failure, and another deadlock has formed, then some other node
> will
> > > > > > re-trigger deadlock change request sooner or later, as it's
> counter
> > > > > > advanced.
> > > > > >
> > > > > > Makes sense?
> > > > > >
> > > > > > On Sat, Nov 24, 2018 at 5:40 PM Павлухин Иван <
> vololo100@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Hi Vladimir,
> > > > > > >
> > > > > > > Regarding fault tolerance. It seems that it is not a problem
> for
> > > > > > > edge-chasing approaches. A found deadlock is identified
by a
> message
> > > > > > > returned to a detection initiator with initiator's identifier.
> If
> > > > > > > there is no deadlock then such message will not come. If
some
> node
> > > > > > > containing a deadlocked transaction fails then it will
break
> the
> > > > > > > deadlock. Am I missing something?
> > > > > > >
> > > > > > > About messaging overhead. Indeed, it looks like edge-chasing
> > > > > > > approaches can bring redundant messages. Perhaps, we can
> borrow some
> > > > > > > ideas about optimization from [1]. And I also think about
a
> timeout
> > > > > > > before starting a deadlock detection.
> > > > > > >
> > > > > > > Thoughts about adaptive timeouts lead me to some thoughts
about
> > > > > > > monitoring. I assume that long waiting for locks signals
about
> not
> > > > > > > optimal performance of the system. I think it would be
great
> to have
> > > > > > > means of monitoring transactions contention. It could help
> users to
> > > > > > > improve theirs workloads. It also could help us to build
some
> > > > > > > reasoning how contention correlates with deadlock detection
> overhead.
> > > > > > >
> > > > > > > [1] http://mentallandscape.com/Papers_podc84.pdf
> > > > > > > вс, 18 нояб. 2018 г. в 10:52, Vladimir Ozerov <
> vozerov@gridgain.com
> > > > >:
> > > > > > > >
> > > > > > > > Hi Ivan,
> > > > > > > >
> > > > > > > > Great analysis. Agree that edge-chasing looks like
better
> > > > candidate.
> > > > > > > First,
> > > > > > > > it will be applicable to both normal and MVCC transactions.
> > > > Second, in
> > > > > > > MVCC
> > > > > > > > we probably will also need to release some locks when
doing
> > > > rollbacks.
> > > > > > > What
> > > > > > > > we should think about is failover - what if a node
which was
> in the
> > > > > > > middle
> > > > > > > > of a graph fails? We need to craft fault-tolerance
carefully.
> > > > > > > >
> > > > > > > > Another critically important point is how to trigger
deadlock
> > > > detection.
> > > > > > > My
> > > > > > > > concerns about edge-chasing was not about latency,
but about
> a
> > > > number of
> > > > > > > > messages which travels between nodes. Good algorithm
must
> produce
> > > > little
> > > > > > > to
> > > > > > > > no messages in case of normal contenion while still
providing
> > > > protection
> > > > > > > in
> > > > > > > > case of real deadlocks. So how would we trigger fraph
> traversal
> > > > for the
> > > > > > > > given transaction? May be we can start with hard timeout
and
> then
> > > > employ
> > > > > > > a
> > > > > > > > kind of adaptive increment in case high rate of
> false-positives are
> > > > > > > > observed. May be something else.
> > > > > > > >
> > > > > > > > On Sun, Nov 18, 2018 at 10:21 AM Павлухин
Иван <
> > > > vololo100@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Vladimir,
> > > > > > > > >
> > > > > > > > > Thanks for the articles! I studied them and a
couple of
> others.
> > > > And I
> > > > > > > > > would like to share a knowledge I found.
> > > > > > > > >
> > > > > > > > > BACKGROUND
> > > > > > > > > First of all our algorithm implemented in
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > >
> org.apache.ignite.internal.processors.cache.transactions.TxDeadlockDetection
> > > > > > > > > is not an edge-chasing algorithm. In essence
a lock-waiting
> > > > > > > > > transaction site polls nodes responsible for
keys of
> interest
> > > > one by
> > > > > > > > > one and reconstructs global wait-for graph (WFG)
locally.
> > > > > > > > > A centralized algorithm discussed in this thread
looks
> similar to
> > > > > > > > > algorithms proposed by Ho [1]. The simplest of
them
> (two-phased)
> > > > > > > > > reports false deadlocks when unlock before transaction
> finish is
> > > > > > > > > permitted. So, it seems that it only works when
strict
> two-phase
> > > > > > > > > locking (2PL) is obeyed. Another one (called
one-phased)
> requires
> > > > > > > > > tracking all locked keys by each transaction
which is not
> > > > desirable
> > > > > > > > > for MVCC transactions.
> > > > > > > > > Aforementioned edge-chasing algorithm by Chandy,
Misra and
> Haas
> > > > [2] is
> > > > > > > > > proven to work even when 2PL is not obeyed.
> > > > > > > > > Also performance target is not clear for me.
It looks like
> > > > centralized
> > > > > > > > > approaches can use fewer messages and provide
lower
> latency that
> > > > > > > > > distributed ones. But would like to understand
what are the
> > > > orders of
> > > > > > > > > latency and messaging overhead. Many of algorithms
are
> described
> > > > in
> > > > > > > > > [3] and some performance details are mentioned.
It is said
> "As
> > > > per
> > > > > > > > > [GRAY81], most deadlocks involve two to four
> transactions." I
> > > > see one
> > > > > > > > > more argument making deadlock detection algorithm
latency
> not so
> > > > > > > > > important. We can consider deadlock as unavoidable
> situation,
> > > > but even
> > > > > > > > > lightning fast detector will not save a database
from
> performance
> > > > > > > > > degradation in case of tons of deadlocks.
> > > > > > > > > What is also interesting, Google Cloud Spanner
employs
> simple
> > > > > > > > > "wound-wait" deadlock prevention [4] instead
of any
> detection
> > > > > > > > > algorithm.
> > > > > > > > >
> > > > > > > > > DISCUSSION
> > > > > > > > > So, I see the following options:
> > > > > > > > > 1. Edge-chasing algorithm like Chandy, Misra
and Haas [2].
> > > > > > > > > 2. Centralized two-phased algorithm described
by Ho [1]
> with
> > > > > > > > > restriction that transactions must obey 2PL.
> > > > > > > > > Personally, I tend to edge-chasing approach because
it
> looks
> > > > simple
> > > > > > > > > and universal. Restricting ourselves with 2PL
will
> restrict us in
> > > > > > > > > features which could be implemented in future
(e.g.
> transaction
> > > > > > > > > savepoints), will not it?
> > > > > > > > >
> > > > > > > > > WDYT?
> > > > > > > > >
> > > > > > > > > Ivan
> > > > > > > > >
> > > > > > > > > [1]
> > > > https://turing.cs.hbg.psu.edu/comp512.papers/HoRamamoorthy-82.pdf
> > > > > > > > > [2]
> > > > > > > > >
> > > > > > >
> > > >
> https://www.cs.utexas.edu/users/misra/scannedPdf.dir/DistrDeadlockDetection.pdf
> > > > > > > > > [3]
> https://www.ics.uci.edu/~cs237/reading/p37-elmagarmid.pdf
> > > > > > > > > [4]
> > > > > > > > >
> > > > > > >
> > > >
> https://cloud.google.com/spanner/docs/transactions#rw_transaction_performance
> > > > > > > > >
> > > > > > > > > ср, 14 нояб. 2018 г. в 22:13, Vladimir
Ozerov <
> > > > vozerov@gridgain.com>:
> > > > > > > > > >
> > > > > > > > > > Ivan,
> > > > > > > > > >
> > > > > > > > > > This is interesting question. I think we
should spend
> some
> > > > time for
> > > > > > > > > formal
> > > > > > > > > > verification whether this algorithm works
or not. Several
> > > > articles
> > > > > > > you
> > > > > > > > > may
> > > > > > > > > > use as a startitng point: [1], [2]. From
what I
> understand,
> > > > Ignite
> > > > > > > fall
> > > > > > > > > > into "AND" model, and currently implemented
algorithm is
> a
> > > > variation
> > > > > > > of
> > > > > > > > > > "edge-chasing" approach as per Chandy, Misra
and Haas
> [3],
> > > > which is
> > > > > > > > > *proven
> > > > > > > > > > to be correct* in that it both detects deadlocks
when
> they are
> > > > > > > present,
> > > > > > > > > and
> > > > > > > > > > do not produce false positives. But is might
be too
> heavy for
> > > > the
> > > > > > > system
> > > > > > > > > > under contention.
> > > > > > > > > >
> > > > > > > > > > We need to search for any formal proof of
correctness of
> > > > proposed
> > > > > > > > > > algorithm. This area is already researched
throughly
> enough,
> > > > so we
> > > > > > > should
> > > > > > > > > > be able to find an answer quickly.
> > > > > > > > > >
> > > > > > > > > > Vladimir.
> > > > > > > > > >
> > > > > > > > > > [1] http://www.cse.scu.edu/~jholliday/dd_9_16.htm
> > > > > > > > > > [2] https://www.cs.uic.edu/~ajayk/Chapter10.pdf
> > > > > > > > > > [3]
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > >
> https://www.cs.utexas.edu/users/misra/scannedPdf.dir/DistrDeadlockDetection.pdf
> > > > > > > > > >
> > > > > > > > > > On Wed, Nov 14, 2018 at 6:55 PM Павлухин
Иван <
> > > > vololo100@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > > Next part as promised. A working item
for me is a
> deadlock
> > > > detector
> > > > > > > > > > > for MVCC transactions [1]. The message
is structured
> in 2
> > > > parts.
> > > > > > > First
> > > > > > > > > > > is an analysis of the current state
of affairs and
> possible
> > > > > > > options to
> > > > > > > > > > > go. Second is a proposed option. First
part is going
> to be
> > > > not so
> > > > > > > > > > > short so some might prefer to skip
it.
> > > > > > > > > > >
> > > > > > > > > > > ANALYSIS
> > > > > > > > > > > The immediate question is "why we cannot
use an
> existing
> > > > deadlock
> > > > > > > > > > > detector?". The differences between
classic and MVCC
> > > > transactions
> > > > > > > > > > > implementation is the answer. Currently
a collection of
> > > > > > > IgniteTxEntry
> > > > > > > > > > > is used for detection. But such collection
is not
> maintained
> > > > for
> > > > > > > MVCC
> > > > > > > > > > > transactions. So, it will not work
out of box.
> > > > > > > > > > > Also it looks like that current distributed
iterative
> > > > approach
> > > > > > > cannot
> > > > > > > > > > > be low latency it the worst case because
of doing
> possibly
> > > > many
> > > > > > > > > > > network requests sequentially.
> > > > > > > > > > > So, what options do we have? Generally
we should choose
> > > > between
> > > > > > > > > > > centralized and distributed approaches.
By centralized
> > > > approach I
> > > > > > > mean
> > > > > > > > > > > existence of a dedicated deadlock detector
located on a
> > > > single
> > > > > > > node.
> > > > > > > > > > > In the centralized approach we can
face difficulties
> related
> > > > to
> > > > > > > > > > > failover as a node running deadlock
detector can fail.
> In the
> > > > > > > > > > > distributed approach extra network
messaging overhead
> can
> > > > strike
> > > > > > > > > > > because different nodes participating
in a deadlock
> can start
> > > > > > > > > > > detection independently and send redundant
messages. I
> see
> > > > some
> > > > > > > > > > > aspects which make sense for choosing
implementation.
> Here
> > > > they are
> > > > > > > > > > > with an approach that is better (roughly
speaking) in
> > > > parentheses:
> > > > > > > > > > > * Detection latency (centralized).
> > > > > > > > > > > * Messaging overhead (centralized).
> > > > > > > > > > > * Failover (distributed).
> > > > > > > > > > > And also having a park of deadlock
detectors sounds
> not very
> > > > good.
> > > > > > > I
> > > > > > > > > > > hope that it is possible to develop
a common solution
> > > > suitable for
> > > > > > > > > > > both kinds of transactions. I suggest
to pilot new
> solution
> > > > with
> > > > > > > MVCC
> > > > > > > > > > > and then adopt it for classic transactions.
> > > > > > > > > > >
> > > > > > > > > > > PROPOSAL
> > > > > > > > > > > Actually I propose to start with an
centralized
> algorithm
> > > > > > > described by
> > > > > > > > > > > Vladimir in the beginning of the thread.
I will try to
> > > > outline main
> > > > > > > > > > > points of it.
> > > > > > > > > > > 1. Single deadlock detector exists
in the cluster which
> > > > maintains
> > > > > > > > > > > transaction wait-for graph (WFG).
> > > > > > > > > > > 2. Each cluster node sends and invalidates
wait-for
> edges to
> > > > the
> > > > > > > > > detector.
> > > > > > > > > > > 3. The detector periodically searches
cycles in WFG and
> > > > chooses and
> > > > > > > > > > > aborts a victim transaction if cycle
is found.
> > > > > > > > > > >
> > > > > > > > > > > Currently I have one fundamental question.
Is there a
> > > > possibility
> > > > > > > of
> > > > > > > > > > > false detected deadlocks because of
concurrent WFG
> updates?
> > > > > > > > > > > Of course there are many points of
improvements and
> > > > optimizations.
> > > > > > > But
> > > > > > > > > > > I would like to start from discussing
key points.
> > > > > > > > > > >
> > > > > > > > > > > Please share your thoughts!
> > > > > > > > > > >
> > > > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-9322
> > > > > > > > > > > ср, 14 нояб. 2018 г. в 15:47,
ipavlukhin <
> > > > vololo100@gmail.com>:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Igniters,
> > > > > > > > > > > >
> > > > > > > > > > > > I would like to resume the discussion
about a
> deadlock
> > > > detector.
> > > > > > > I
> > > > > > > > > start
> > > > > > > > > > > > with a motivation for a further
work on a subject.
> As I see
> > > > > > > current
> > > > > > > > > > > > implementation (entry point
> IgniteTxManager.detectDeadlock)
> > > > > > > starts a
> > > > > > > > > > > > detection only after a transaction
was timed out. In
> my
> > > > mind it
> > > > > > > is
> > > > > > > > > not
> > > > > > > > > > > > very good from a product usability
standpoint. As you
> > > > know, in a
> > > > > > > > > > > > situation of deadlock some keys
become non-usable
> for an
> > > > infinite
> > > > > > > > > amount
> > > > > > > > > > > > of time. Currently the only way
to work around it is
> > > > configuring
> > > > > > > a
> > > > > > > > > > > > timeout, but it could be rather
tricky in practice to
> > > > choose a
> > > > > > > > > > > > proper/universal value for it.
So, I see the main
> point as:
> > > > > > > > > > > >
> > > > > > > > > > > > Ability to break deadlocks without
a need to
> configure
> > > > timeouts
> > > > > > > > > > > explicitly.
> > > > > > > > > > > >
> > > > > > > > > > > > I will return soon with some thoughts
about
> implementation.
> > > > > > > > > Meanwhile,
> > > > > > > > > > > > does anybody have in mind any
other usability points
> which
> > > > I am
> > > > > > > > > missing?
> > > > > > > > > > > > Or is there any alternative approaches?
> > > > > > > > > > > >
> > > > > > > > > > > > On 2017/11/21 08:32:02, Dmitriy
Setrakyan <
> d...@apache.org
> > > > >
> > > > > > > wrote:
> > > > > > > > > > > >  > On Mon, Nov 20, 2017 at
10:15 PM, Vladimir Ozerov
> <
> > > > > > > > > vo...@gridgain.com
> > > > > > > > > > > >>
> > > > > > > > > > > >  > wrote:>
> > > > > > > > > > > >  >
> > > > > > > > > > > >  > > It doesn’t need all
txes. Instead, other nodes
> will
> > > > send
> > > > > > > info
> > > > > > > > > about>
> > > > > > > > > > > >  > > suspicious txes to
it from time to time.>
> > > > > > > > > > > >  > >>
> > > > > > > > > > > >  >
> > > > > > > > > > > >  > I see your point, I think
it might work.>
> > > > > > > > > > > >  >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Best regards,
> > > > > > > > > > > Ivan Pavlukhin
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best regards,
> > > > > > > > > Ivan Pavlukhin
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best regards,
> > > > > > > Ivan Pavlukhin
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Ivan Pavlukhin
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Ivan Pavlukhin
> > > >
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message