ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Павлухин Иван <vololo...@gmail.com>
Subject Re: Tx lock partial happens before
Date Tue, 16 Jul 2019 06:28:27 GMT
Anton,

Thank you for clarification.

вт, 16 июл. 2019 г. в 09:24, Anton Vinogradov <av@apache.org>:
>
> Ivan R.
>
> Thanks.
> I'll try to implement approach you proposed.
>
> Ivan P.
>
> >> what prevents primary partition relocation when
> >> Read Repair is in progress? Is there a transaction or an exlicit lock?
> Did you mean partition eviction?
> RR is almost a regular get with the same logic. It maps on some topology
> and performs regular gets.
> In case node failed or is not an owner anymore we'll just ignore this.
> See the code for details:
>
> if (invalidNodeSet.contains(affNode) || !cctx.discovery().alive(affNode)) {
> onDone(Collections.emptyMap()); // Finishing mini future with just "ok".
>
> On Tue, Jul 16, 2019 at 9:04 AM Павлухин Иван <vololo100@gmail.com>
wrote:
>
> > Anton,
> >
> > You referenced to failover scenarios. I believe that everything is
> > described in IEP. But to make this discussion self-sufficient could
> > you please outline what prevents primary partition relocation when
> > Read Repair is in progress? Is there a transaction or an exlicit lock?
> >
> > пн, 15 июл. 2019 г. в 23:49, Ivan Rakov <ivan.glukos@gmail.com>:
> > >
> > > Anton,
> > >
> > > > Step-by-step:
> > > > 1) primary locked on key mention (get/put) at
> > pessimistic/!read-committed tx
> > > > 2) backups locked on prepare
> > > > 3) primary unlocked on finish
> > > > 4) backups unlocked on finish (after the primary)
> > > > correct?
> > > Yes, this corresponds to my understanding of transactions protocol. With
> > > minor exception: steps 3 and 4 are inverted in case of one-phase commit.
> > >
> > > > Agree, but seems there is no need to acquire the lock, we have just to
> > wait
> > > > until entry becomes unlocked.
> > > > - entry locked means that previous tx's "finish" phase is in progress
> > > > - entry unlocked means reading value is up-to-date (previous "finish"
> > phase
> > > > finished)
> > > > correct?
> > > Diving deeper, entry is locked if its GridCacheMapEntry.localCandidates
> > > queue is not empty (first item in queue is actually the transaction that
> > > owns lock).
> > >
> > > > we have just to wait
> > > > until entry becomes unlocked.
> > > This may work.
> > > If consistency checking code has acquired lock on primary, backup can be
> > > in two states:
> > > - not locked - and new locks won't appear as we are holding lock on
> > primary
> > > - still locked by transaction that owned lock on primary just before our
> > > checking code - in such case checking code should just wait for lock
> > release
> > >
> > > Best Regards,
> > > Ivan Rakov
> > >
> > > On 15.07.2019 9:34, Anton Vinogradov wrote:
> > > > Ivan R.
> > > >
> > > > Thanks for joining!
> > > >
> > > > Got an idea, but not sure that got a way of a fix.
> > > >
> > > > AFAIK (can be wrong, please correct if necessary), at 2PC, locks are
> > > > acquired on backups during the "prepare" phase and released at "finish"
> > > > phase after primary fully committed.
> > > > Step-by-step:
> > > > 1) primary locked on key mention (get/put) at
> > pessimistic/!read-committed tx
> > > > 2) backups locked on prepare
> > > > 3) primary unlocked on finish
> > > > 4) backups unlocked on finish (after the primary)
> > > > correct?
> > > >
> > > > So, acquiring locks on backups, not at the "prepare" phase, may cause
> > > > unexpected behavior in case of primary fail or other errors.
> > > > That's definitely possible to update failover to solve this issue, but
> > it
> > > > seems to be an overcomplicated way.
> > > > The main question there, it there any simple way?
> > > >
> > > >>> checking read from backup will just wait for commit if it's in
> > progress.
> > > > Agree, but seems there is no need to acquire the lock, we have just to
> > wait
> > > > until entry becomes unlocked.
> > > > - entry locked means that previous tx's "finish" phase is in progress
> > > > - entry unlocked means reading value is up-to-date (previous "finish"
> > phase
> > > > finished)
> > > > correct?
> > > >
> > > > On Mon, Jul 15, 2019 at 8:37 AM Павлухин Иван <vololo100@gmail.com>
> > wrote:
> > > >
> > > >> Anton,
> > > >>
> > > >> I did not know mechanics locking entries on backups during prepare
> > > >> phase. Thank you for pointing that out!
> > > >>
> > > >> пт, 12 июл. 2019 г. в 22:45, Ivan Rakov <ivan.glukos@gmail.com>:
> > > >>> Hi Anton,
> > > >>>
> > > >>>> Each get method now checks the consistency.
> > > >>>> Check means:
> > > >>>> 1) tx lock acquired on primary
> > > >>>> 2) gained data from each owner (primary and backups)
> > > >>>> 3) data compared
> > > >>> Did you consider acquiring locks on backups as well during your
> > check,
> > > >>> just like 2PC prepare does?
> > > >>> If there's HB between steps 1 (lock primary) and 2 (update primary
+
> > > >>> lock backup + update backup), you may be sure that there will
be no
> > > >>> false-positive results and no deadlocks as well. Protocol won't
be
> > > >>> complicated: checking read from backup will just wait for commit
if
> > it's
> > > >>> in progress.
> > > >>>
> > > >>> Best Regards,
> > > >>> Ivan Rakov
> > > >>>
> > > >>> On 12.07.2019 9:47, Anton Vinogradov wrote:
> > > >>>> Igniters,
> > > >>>>
> > > >>>> Let me explain problem in detail.
> > > >>>> Read Repair at pessimistic tx (locks acquired on primary,
full sync,
> > > >> 2pc)
> > > >>>> able to see consistency violation because backups are not
updated
> > yet.
> > > >>>> This seems to be not a good idea to "fix" code to unlock primary
> > only
> > > >> when
> > > >>>> backups updated, this definitely will cause a performance
drop.
> > > >>>> Currently, there is no explicit sync feature allows waiting
for
> > backups
> > > >>>> updated during the previous tx.
> > > >>>> Previous tx just sends GridNearTxFinishResponse to the originating
> > > >> node.
> > > >>>> Bad ideas how to handle this:
> > > >>>> - retry some times (still possible to gain false positive)
> > > >>>> - lock tx entry on backups (will definitely break failover
logic)
> > > >>>> - wait for same entry version on backups during some timeout
(will
> > > >> require
> > > >>>> huge changes at "get" logic and false positive still possible)
> > > >>>>
> > > >>>> Is there any simple fix for this issue?
> > > >>>> Thanks for tips in advance.
> > > >>>>
> > > >>>> Ivan,
> > > >>>> thanks for your interest
> > > >>>>
> > > >>>>>> 4. Very fast and lucky txB writes a value 2 for the
key on primary
> > > >> and
> > > >>>> backup.
> > > >>>> AFAIK, reordering not possible since backups "prepared" before
> > primary
> > > >>>> releases lock.
> > > >>>> So, consistency guaranteed by failover and by "prepare" feature
of
> > 2PC.
> > > >>>> Seems, the problem is NOT with consistency at AI, but with
> > consistency
> > > >>>> detection implementation (RR) and possible "false positive"
results.
> > > >>>> BTW, checked 1PC case (only one data node at test) and gained
no
> > > >> issues.
> > > >>>> On Fri, Jul 12, 2019 at 9:26 AM Павлухин Иван
<vololo100@gmail.com>
> > > >> wrote:
> > > >>>>> Anton,
> > > >>>>>
> > > >>>>> Is such behavior observed for 2PC or for 1PC optimization?
Does
> > not it
> > > >>>>> mean that the things can be even worse and an inconsistent
write is
> > > >>>>> possible on a backup? E.g. in scenario:
> > > >>>>> 1. txA writes a value 1 for the key on primary.
> > > >>>>> 2. txA unlocks the key on primary.
> > > >>>>> 3. txA freezes before updating backup.
> > > >>>>> 4. Very fast and lucky txB writes a value 2 for the key
on primary
> > and
> > > >>>>> backup.
> > > >>>>> 5. txB wakes up and writes 1 for the key.
> > > >>>>> 6. As result there is 2 on primary and 1 on backup.
> > > >>>>>
> > > >>>>> Naively it seems that locks should be released after all
replicas
> > are
> > > >>>>> updated.
> > > >>>>>
> > > >>>>> ср, 10 июл. 2019 г. в 16:36, Anton Vinogradov <av@apache.org>:
> > > >>>>>> Folks,
> > > >>>>>>
> > > >>>>>> Investigating now unexpected repairs [1] in case of
ReadRepair
> > usage
> > > >> at
> > > >>>>>> testAccountTxNodeRestart.
> > > >>>>>> Updated [2] the test to check is there any repairs
happen.
> > > >>>>>> Test's name now is "testAccountTxNodeRestartWithReadRepair".
> > > >>>>>>
> > > >>>>>> Each get method now checks the consistency.
> > > >>>>>> Check means:
> > > >>>>>> 1) tx lock acquired on primary
> > > >>>>>> 2) gained data from each owner (primary and backups)
> > > >>>>>> 3) data compared
> > > >>>>>>
> > > >>>>>> Sometime, backup may have obsolete value during such
check.
> > > >>>>>>
> > > >>>>>> Seems, this happen because tx commit on primary going
in the
> > > >> following
> > > >>>>> way
> > > >>>>>> (check code [2] for details):
> > > >>>>>> 1) performing localFinish (releases tx lock)
> > > >>>>>> 2) performing dhtFinish (commits on backups)
> > > >>>>>> 3) transferring control back to the caller
> > > >>>>>>
> > > >>>>>> So, seems, the problem here is that "tx lock released
on primary"
> > > >> does
> > > >>>>> not
> > > >>>>>> mean that backups updated, but "commit() method finished
at
> > caller's
> > > >>>>>> thread" does.
> > > >>>>>> This means that, currently, there is no happens-before
between
> > > >>>>>> 1) thread 1 committed data on primary and tx lock
can be
> > reobtained
> > > >>>>>> 2) thread 2 reads from backup
> > > >>>>>> but still strong HB between "commit() finished" and
"backup
> > updated"
> > > >>>>>>
> > > >>>>>> So, it seems to be possible, for example, to gain
notification by
> > a
> > > >>>>>> continuous query, then read from backup and gain obsolete
value.
> > > >>>>>>
> > > >>>>>> Is this "partial happens before" behavior expected?
> > > >>>>>>
> > > >>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-11973
> > > >>>>>> [2] https://github.com/apache/ignite/pull/6679/files
> > > >>>>>> [3]
> > > >>>>>>
> > > >>
> > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal#finishTx
> > > >>>>>
> > > >>>>>
> > > >>>>> --
> > > >>>>> Best regards,
> > > >>>>> Ivan Pavlukhin
> > > >>>>>
> > > >>
> > > >>
> > > >> --
> > > >> Best regards,
> > > >> Ivan Pavlukhin
> > > >>
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
> >



-- 
Best regards,
Ivan Pavlukhin

Mime
View raw message