ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Ozerov <voze...@gridgain.com>
Subject Re: Suggestion to improve deadlock detection
Date Tue, 21 Nov 2017 06:15:35 GMT
It doesn’t need all txes. Instead, other nodes will send info about
suspicious txes to it from time to time.

вт, 21 нояб. 2017 г. в 8:04, Dmitriy Setrakyan <dsetrakyan@apache.org>:

> How does it know about all the Txs?
>
> ⁣D.​
>
> On Nov 20, 2017, 8:53 PM, at 8:53 PM, Vladimir Ozerov <
> vozerov@gridgain.com> wrote:
> >Dima,
> >
> >What is wrong with coordinator approach? All it does is analyze small
> >number of TXes which wait for locks for too long.
> >
> >вт, 21 нояб. 2017 г. в 1:16, Dmitriy Setrakyan <dsetrakyan@apache.org>:
> >
> >> Vladimir,
> >>
> >> I am not sure I like it, mainly due to some coordinator node doing
> >some
> >> periodic checks. For the deadlock detection to work effectively, it
> >has to
> >> be done locally on every node. This may require that every tx request
> >will
> >> carry information about up to N previous keys it accessed, but the
> >> detection will happen locally on the destination node.
> >>
> >> What do you think?
> >>
> >> D.
> >>
> >> On Mon, Nov 20, 2017 at 11:50 AM, Vladimir Ozerov
> ><vozerov@gridgain.com>
> >> wrote:
> >>
> >> > Igniters,
> >> >
> >> > We are currently working on transactional SQL and distributed
> >deadlocks
> >> are
> >> > serious problem for us. It looks like current deadlock detection
> >> mechanism
> >> > has several deficiencies:
> >> > 1) It transfer keys! No go for SQL as we may have millions of keys.
> >> > 2) By default we wait for a minute. Way too much IMO.
> >> >
> >> > What if we change it as follows:
> >> > 1) Collect XIDs of all preceding transactions while obtaining lock
> >within
> >> > current transaction object. This way we will always have the list
> >of TXes
> >> > we wait for.
> >> > 2) Define TX deadlock coordinator node
> >> > 3) Periodically (e.g. once per second), iterate over active
> >transactions
> >> > and detect ones waiting for a lock for too long (e.g. >2-3 sec).
> >Timeouts
> >> > could be adaptive depending on the workload and false-pasitive
> >alarms
> >> rate.
> >> > 4) Send info about those long-running guys to coordinator in a form
> >> Map[XID
> >> > -> List<XID>]
> >> > 5) Rebuild global wait-for graph on coordinator and search for
> >deadlocks
> >> > 6) Choose the victim and send problematic wait-for graph to it
> >> > 7) Victim collects necessary info (e.g. keys, SQL statements,
> >thread IDs,
> >> > cache IDs, etc.) and throws an exception.
> >> >
> >> > Advantages:
> >> > 1) We ignore short transactions. So if there are tons of short TXes
> >on
> >> > typical OLTP workload, we will never many of them
> >> > 2) Only minimal set of data is sent between nodes, so we can
> >exchange
> >> data
> >> > often without loosing performance.
> >> >
> >> > Thoughts?
> >> >
> >> > Vladimir.
> >> >
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message