ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Ozerov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (IGNITE-9322) MVCC: implement deadlock detector
Date Thu, 27 Dec 2018 14:16:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16729634#comment-16729634
] 

Vladimir Ozerov edited comment on IGNITE-9322 at 12/27/18 2:15 PM:
-------------------------------------------------------------------

Hi. We conducted review of current design with [~Pavlukhin], [~gvvinblade], [~agoncharuk],
[~amashenkov] and [~agura]. Here is the outcome of this discussion:
# We decided that deadlock detection should be triggered after new "deadlock detection timeout"
is expired. It will be added to {{TransactionConfiguration}}. Default value of this timeout
will be equal to "failure detection timeout", which is 10s.
# We will not trigger deadlock detection on normal transaction timeout as it is done with
current deadlock detector. The reason is that our new algorithm do not know whether deadlock
has happened until it receives a probe with a cycle. As this may never happen, we would need
to wait for some additional timeout *in addition* to transaction timeout to be more or less
confident that we are no in a deadlock. This is not convenient. Alternative - to notify probe
originator that probe failed to find a deadlock explicitly. But unfortunately it is not possible
with our algorithm either - probe may travel multi-way path in case of concurrent lock acquisition
(e.g. {{GridNearTxQueryEnlistRequest}}), so we do not know exactly when probing is completely
finished.
# We need to add a counter to locking engine - when the next key is locked, counter is advanced.
Current value of the counter should be added to probe. If cycle is detected and probe returns
to us, we must double-check whether original lock counter is still the same. If yes - start
rollback. Otherwise - ignore probe. This is needed to avoid rollback of presumably deadlocked
transaction, while in reality it is no longer in a deadlock. One case when it may happen -
concurrent timeout of rollback of another deadlocked transaction.
# When probe travels between nodes, we must collect info about all transactions we passed
through. This is needed to detect cycle when probe originator is not a part of deadlock. E.g.
consider a situation when nodes A, B, C are in a deadlock. Then node D tries to acquire a
key locked by A and eventually triggers a probe. In this case it will travel (D -> A =>
B -> C -> A). Info of all passed transactions is needed to stop the probe once we reach
A for the second time.
# Info about transaction we must collect: 
#* {{xid}} - to know which transaction to kill
#* {{near_node_id}} - to know what node to ask for a kill
#* {{lock_counter}} - lock counter of current transaction to avoid false-positive rollbacks
as explained in p.3
#* {{start_time}} - start time of current transaction, to choose a victim as explained below
# Victim transaction should be chosen based on either "youngest dies" or "oldest dies" policy.
We need to investigate literature to find out what algorithm is better. In any case,victim
will be chosen via comparison of TX infos. First we compare {{start_time}}. If it is equal,
we compare {{xid}} as it is both comparable and unique inside a cluster. Note that due to
a clock drift this algorithm may choose a transaction which is not actually youngest/oldest.
Instead, this is best effort approach which will work correctly most of the time. But this
approach has another critical property: given N different transactions (t1, t2 ... tN), every
node will choose the same victim, thus avoiding killing more than one transaction.
# Consider optimization: delay probe start for the given {{(xid, lock_counter)}} pair if another
probe passed through this pair recently. This way we will decrease number of message when
several nodes participating in a deadlock want to start probing nearly simultaneously. To
achieve this we must save a timestamp of a last observed probe for the given {{(xid, lock_counter)}}.
When counter is advanced, this timestamp is cleared.

Future improvements which we consider out of scope of this ticket:
# Add SQL view and JMX beans to track deadlock detection process: number of probes sent, number
of probes received, number of detected deadlocks, etc.
# Add detailed deadlock info to exception and/or node logs to simplify debug for users. This
may contain transaction labels, locked keys (might be unsafe from security perspective), locked
statements, etc.
# Investigate other victim selection algorithms. E.g. "least/most acquired locks" or so
# It seems that we can reuse the same mechanism for "classical" caches and deprecate existing
deadlock detector. This is a good candidate for AI 3.0 scope.


was (Author: vozerov):
Hi. We conducted review of current design with [~Pavlukhin], [~gvvinblade], [~agoncharuk],
[~amashenkov] and [~agura]. Here is the outcome of this discussion:
# We decided that deadlock detection should be triggered after new "deadlock detection timeout"
is expired. It will be added to {{TransactionConfiguration}}. Default value of this timeout
will be equal to "failure detection timeout", which is 10s.
# We will not trigger deadlock detection on normal transaction timeout as it is done with
current deadlock detector. The reason is that our new algorithm do not know whether timeout
is happened until it receives a probe with a cycle. As this may never happen, we would need
to wait for some additional timeout *in addition* to transaction timeout to be more or less
confident that we are no in a deadlock. This is not convenient. Alternative - to notify probe
originator that probe failed to find a deadlock explicitly. But unfortunately it is not possible
with our algorithm either - probe may travel multi-way path in case of concurrent lock acquisition
(e.g. {{GridNearTxQueryEnlistRequest}}), so we do not know exactly when probing is completely
finished.
# We need to add a counter to locking engine - when the next key is locked, counter is advanced.
Current value of the counter should be added to probe. If cycle is detected and probe returns
to us, we must double-check whether original lock counter is still the same. If yes - start
rollback. Otherwise - ignore probe. This is needed to avoid rollback of presumably deadlocked
transaction, while in reality it is no longer in a deadlock. One case when it may happen -
concurrent timeout of rollback of another deadlocked transaction.
# When probe travels between nodes, we must collect info about all transactions we passed
through. This is needed to detect cycle when probe originator is not a part of deadlock. E.g.
consider a situation when nodes A, B, C are in a deadlock. Then node D tries to acquire a
key locked by A and eventually triggers a probe. In this case it will travel (D -> A =>
B -> C -> A). Info of all passed transactions is needed to stop the probe once we reach
A for the second time.
# Info about transaction we must collect: 
#* {{xid}} - to know which transaction to kill
#* {{near_node_id}} - to know what node to ask for a kill
#* {{lock_counter}} - lock counter of current transaction to avoid false-positive rollbacks
as explained in p.3
#* {{start_time}} - start time of current transaction, to choose a victim as explained below
# Victim transaction should be chosen based on either "youngest dies" or "oldest dies" policy.
We need to investigate literature to find out what algorithm is better. In any case,victim
will be chosen via comparison of TX infos. First we compare {{start_time}}. If it is equal,
we compare {{xid}} as it is both comparable and unique inside a cluster. Note that due to
a clock drift this algorithm may choose a transaction which is not actually youngest/oldest.
Instead, this is best effort approach which will work correctly most of the time. But this
approach has another critical property: given N different transactions (t1, t2 ... tN), every
node will choose the same victim, thus avoiding killing more than one transaction.
# Consider optimization: delay probe start for the given {{(xid, lock_counter)}} pair if another
probe passed through this pair recently. This way we will decrease number of message when
several nodes participating in a deadlock want to start probing nearly simultaneously. To
achieve this we must save a timestamp of a last observed probe for the given {{(xid, lock_counter)}}.
When counter is advanced, this timestamp is cleared.

Future improvements which we consider out of scope of this ticket:
# Add SQL view and JMX beans to track deadlock detection process: number of probes sent, number
of probes received, number of detected deadlocks, etc.
# Add detailed deadlock info to exception and/or node logs to simplify debug for users. This
may contain transaction labels, locked keys (might be unsafe from security perspective), locked
statements, etc.
# Investigate other victim selection algorithms. E.g. "least/most acquired locks" or so
# It seems that we can reuse the same mechanism for "classical" caches and deprecate existing
deadlock detector. This is a good candidate for AI 3.0 scope.

> MVCC: implement deadlock detector
> ---------------------------------
>
>                 Key: IGNITE-9322
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9322
>             Project: Ignite
>          Issue Type: Task
>          Components: mvcc
>            Reporter: Vladimir Ozerov
>            Assignee: Ivan Pavlukhin
>            Priority: Major
>
> Deadlocks are not uncommon during SQL execution.
> We need to implement distributed deadlock detection protocol for MVCC. Essentially, nodes
should exchange some map of tx wait lists, and try to find a loop. If loop is found, then
one of problematic transactions should be rolled back.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message