ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Setrakyan <dsetrak...@apache.org>
Subject Re: Suggestion to improve deadlock detection
Date Tue, 21 Nov 2017 05:04:32 GMT
How does it know about all the Txs?

⁣D.​

On Nov 20, 2017, 8:53 PM, at 8:53 PM, Vladimir Ozerov <vozerov@gridgain.com> wrote:
>Dima,
>
>What is wrong with coordinator approach? All it does is analyze small
>number of TXes which wait for locks for too long.
>
>вт, 21 нояб. 2017 г. в 1:16, Dmitriy Setrakyan <dsetrakyan@apache.org>:
>
>> Vladimir,
>>
>> I am not sure I like it, mainly due to some coordinator node doing
>some
>> periodic checks. For the deadlock detection to work effectively, it
>has to
>> be done locally on every node. This may require that every tx request
>will
>> carry information about up to N previous keys it accessed, but the
>> detection will happen locally on the destination node.
>>
>> What do you think?
>>
>> D.
>>
>> On Mon, Nov 20, 2017 at 11:50 AM, Vladimir Ozerov
><vozerov@gridgain.com>
>> wrote:
>>
>> > Igniters,
>> >
>> > We are currently working on transactional SQL and distributed
>deadlocks
>> are
>> > serious problem for us. It looks like current deadlock detection
>> mechanism
>> > has several deficiencies:
>> > 1) It transfer keys! No go for SQL as we may have millions of keys.
>> > 2) By default we wait for a minute. Way too much IMO.
>> >
>> > What if we change it as follows:
>> > 1) Collect XIDs of all preceding transactions while obtaining lock
>within
>> > current transaction object. This way we will always have the list
>of TXes
>> > we wait for.
>> > 2) Define TX deadlock coordinator node
>> > 3) Periodically (e.g. once per second), iterate over active
>transactions
>> > and detect ones waiting for a lock for too long (e.g. >2-3 sec).
>Timeouts
>> > could be adaptive depending on the workload and false-pasitive
>alarms
>> rate.
>> > 4) Send info about those long-running guys to coordinator in a form
>> Map[XID
>> > -> List<XID>]
>> > 5) Rebuild global wait-for graph on coordinator and search for
>deadlocks
>> > 6) Choose the victim and send problematic wait-for graph to it
>> > 7) Victim collects necessary info (e.g. keys, SQL statements,
>thread IDs,
>> > cache IDs, etc.) and throws an exception.
>> >
>> > Advantages:
>> > 1) We ignore short transactions. So if there are tons of short TXes
>on
>> > typical OLTP workload, we will never many of them
>> > 2) Only minimal set of data is sent between nodes, so we can
>exchange
>> data
>> > often without loosing performance.
>> >
>> > Thoughts?
>> >
>> > Vladimir.
>> >
>>

Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message