ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Кузнецов Алексей Львович <alkuznetsov...@gmail.com>
Subject Re: [IGNITE-5714] Design proposal of suspend/resume for pessimistic tx
Date Fri, 20 Apr 2018 09:17:11 GMT
PR for this issue is ready to be reviewed.
Changes are relatively small.

1. No network messages have been changed.
2. Thread id usage replaced with xid usage when we work with candidates.
3. All current tests are passed [3]

Please, review. [1], [2]

[1] https://issues.apache.org/jira/browse/IGNITE-5714
[2] https://github.com/apache/ignite/pull/2789
[3] 
https://ci.ignite.apache.org/viewLog.html?buildId=1229177&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_RunAll

28.02.2018 19:10, Aleksey Kuznetsov пишет:
> > What if originating(or primary, or couple of primaries) node(s) 
> fails when tx is suspended?
>
> Thanks for advice, I've added a couple of failover tests. With the 
> following scenarios :
>
> Started transaction, perform some put, suspend it and primary node 
> fails. Works as expected : tx rollbacks.
> In another scenario, the originating node fails. Works as expected : 
> tx rollbacks.
>
> > Do we have tests for this feature?
>
> yeah,we do.
>
> ср, 28 февр. 2018 г. в 14:45, Nikolay Izhikov <nizhikov@apache.org 
> <mailto:nizhikov@apache.org>>:
>
>     Hello, Aleksey.
>
>     > Tests were added for suspending/resuming optimistic transaction
>     earlier for parent task.
>     > I adopted them for pessimistic mode. Also tx rollback tests were
>     aded.
>
>     I think, at least, we need new failover tests.
>
>     What if originating(or primary, or couple of primaries) node(s)
>     fails when tx is suspended?
>
>     > Note, that if user suspends transaction and forgets to resume it,
>     > transaction would be rolled back once timeout has occurred.
>
>     Do we have tests for this feature?
>
>     В Ср, 28/02/2018 в 10:29 +0000, Aleksey Kuznetsov пишет:
>     > Thanx for the answer!
>     >
>     > > Can we remove thread id completely from code?
>     > > Can you estimate how much effort do we need?
>     >
>     > I haven't removed thread id completely from the code because we
>     tightly
>     > coupled to it in some parts.
>     > For instance, look at the following scenario :
>     >
>     > "When we suspend transaction, we must make sure only thread started
>     > transaction can suspend it. So we compare thread id in
>     > `GridNearTxLocal#suspend()`."
>     >
>     > Note that cache locks also make use of thread id, consider the
>     following
>     > scenario:
>     >
>     > "We create cache lock and perform put operation in the same
>     thread, which
>     > create implicit transaction. Transaction would compare both
>     therad ids in
>     > tx and cache lock and can apply the optimisation: tx would take
>     advantage
>     > of cache lock, and would not create its own lock on key."
>     >
>     > If we really needed to remove it completely from the code,
>     significant
>     > changes are required both to transactions and cache locks.
>     >
>     > It could take several weeks to completely remove thread id.
>     >
>     > > As far as I can see from parent task [1] we need some complex
>     tests to be
>     >
>     > implemented.
>     > > Are they presented in prototype?
>     >
>     > Tests were added for suspending/resuming optimistic transaction
>     earlier for
>     > parent task.
>     > I adopted them for pessimistic mode.
>     > Also tx rollback tests were aded.  They check whether timeout
>     metrics are
>     > correctly calculated for suspended transactions. You can find
>     them in
>     > `GridCacheTransactionalAbstractMetricsSelfTest`.
>     >
>     > Please, feel free to propose additional tests for the ticket.
>     >
>     >
>     > вт, 27 февр. 2018 г. в 17:27, Nikolay Izhikov
>     <nizhikov@apache.org <mailto:nizhikov@apache.org>>:
>     >
>     > > Hello, Alexey.
>     > >
>     > > Great mail, by the way.
>     > > I think, it would be great to have this feature in Ignite.
>     > >
>     > > > I haven't removed thread id completely from code.
>     > >
>     > > Can we remove thread id completely from code?
>     > > Can you estimate how much effort do we need?
>     > >
>     > > As far as I can see from parent task [1] we need some complex
>     tests to be
>     > > implemented.
>     > > Are they presented in prototype?
>     > >
>     > > [1]
>     > >
>     https://issues.apache.org/jira/browse/IGNITE-4887?focusedCommentId=16069655&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16069655
>     > >
>     > > В Пн, 26/02/2018 в 10:59 +0000, Aleksey Kuznetsov пишет:
>     > > > Hello Igniters!
>     > > >
>     > > > Currently we have suspension/resuming implemented for optimistic
>     > > > transactions [1].
>     > > > Unless suspend/resume isn't supported for pessimistic tx JTA
>     isn't fully
>     > > > supported [4].
>     > > >
>     > > > I’m working on a ticket "Suspension/resuming for pessimistic
>     > >
>     > > transactions"
>     > > > [2].
>     > > > Goal of the ticket is to support transaction suspend/resume for
>     > >
>     > > pessimistic
>     > > > transactions.
>     > > >
>     > > > # Benefits of suspending/resuming transaction.
>     > > >
>     > > >   1. Full JTA standart support.
>     > > >   2. Increase of throughput in high-load scenarios.
>     > > >       Suspend operation would allow to release Ignite
>     threads and
>     > > > optionally perform some other logic.
>     > > >
>     > > > Note, current API has got suspend/resume only for optimistic
>     > >
>     > > transactions,
>     > > > which confuses users.
>     > > >
>     > > > # Real life example.
>     > > >
>     > > > Consider the following scenario:
>     > > >
>     > > >   1. Application starts Ignite transaction.
>     > > >   2. Business logic is executed inside transaction.
>     > > >   3. For commit/rollback application need approval message
>     from external
>     > > > agent.
>     > > >   4. Currently, thread inside Ignite is idle until approval
>     is received.
>     > > >   4a. When suspend/resume support is implemented,
>     application can perform
>     > > > suspend and release thread inside Ignite.
>     > > >
>     > > > # How pessimistic transaction works.
>     > > >
>     > > > When we perform put/get operations in pessimistic
>     transactions, lock
>     > > > request is sent to remote nodes by `GridNearLockRequest`.
>     > > > Request contains thread id `IgniteTxAdapter#threadId`, in
>     which operation
>     > > > was performed.
>     > > > In pessimistic mode, multiple transaction objects are
>     created - on
>     > > > primary, on backup nodes, and on originating node:
>     > > > `GridNearTxLocal`, `GridDhtTxLocal`, `GridNearTxRemote`,
>     > >
>     > > `GridDhtTxRemote`.
>     > > >
>     > > > Thread id is used in logic on these nodes.
>     > > > For instance, to check whether thread has successfully
>     locked the key,
>     > > > after lock acquisition attempt.
>     > > > Or to check whether active transaction exists.
>     > > >
>     > > > # Main challenge for implementation.
>     > > >
>     > > > I've analysed implementation approaches and see the core issue:
>     > > >
>     > > > The essential problem with suspending/resuming lies in
>     thread id field
>     > > > transferred to remote nodes during put/get operations.
>     > > >
>     > > > Imagine, we want to suspend transaction and resume it in
>     another thread.
>     > > > See code snippet below:
>     > > >
>     > > > ```
>     > > > tx = ignite.transactions().txStart(PESSIMISTIC, SERIALIZABLE);
>     > > >
>     > > > cache.put(1, 1);
>     > > >
>     > > > tx.suspend();
>     > > > ....
>     > > >
>     > > > // In another thread.
>     > > > tx.resume(); // Thread id will be changed in transaction
>     instance.
>     > > > ```
>     > > >
>     > > > Original thread id is transferred and saved on remote node.
>     > > > After resuming thread id on local node differs from remote node.
>     > > > I want to avoid one more network round trip to change thread
>     Id on remote
>     > > > node after transaction resuming.
>     > > >
>     > > > # Design proposal.
>     > > >
>     > > > Transaction id (`xid`) can be used instead of thread id on
>     remote nodes.
>     > > > The following solution is possible for the ticket :
>     > > >
>     > > > Replace thread id by transaction id for sending to remote nodes.
>     > > > Thread id will be removed from the following classes:
>     > > > `IgniteTxAdapter`, `GridDistributedTxPrepareRequest`,
>     > > > `GridDistributedTxFinishRequest`,
>     `GridDistributedTxFinishResponse`,
>     > > > `GridDistributedTxPrepareRequest`.
>     > > >
>     > > > I haven't removed thread id completely from code. Thread id
>     is moved to
>     > > > `GridNearTxLocal`.
>     > > > We still need it in near local transaction for many reasons,
>     for example
>     > >
>     > > to
>     > > > assure only thread started transaction can suspend it in
>     > > > `GridNearTxLocal#suspend()`.
>     > > > In future we can remove thread id completely. I propose to
>     study this
>     > > > question in another ticket.
>     > > >
>     > > > Also, thread id is remained in `GridDistributedLockRequest`.
>     > > > Lock request used by cache locks and it need to transfer
>     thread id to
>     > > > remote nodes.
>     > > > For example to use cache locks along with cache operations
>     put/get, see
>     > > > `GridNearTxLocal#updateExplicitVersion`.
>     > > > As for pessimistic transaction, thread id in
>     `GridDistributedLockRequest`
>     > > > is set to `UNDEFINED_THREAD_ID`, which means we must not use
>     it remotely.
>     > > >
>     > > > Note, that if user suspends transaction and forgets to
>     resume it,
>     > > > transaction would be rolled back once timeout has occurred.
>     > > >
>     > > > In my design when transaction is suspended, all locked keys
>     remain
>     > >
>     > > locked.
>     > > >
>     > > > Please see my prototype of proposal implementation [3].
>     > > >
>     > > > Proposed changes are relatively small.
>     > > > They ensure consistency of information about locks, if
>     thread Id will be
>     > > > changed within one transaction (by suspend/resume).
>     > > > There will be used correct id for locks on remote nodes. It
>     also requires
>     > > > painstaking work, but changes will not affect the logic of oher
>     > >
>     > > components.
>     > > >
>     > > > Tell me please what do you think? Any suggestions and
>     comments will be
>     > > > helpful.
>     > > >
>     > > > If you agree with my design I also will do benchmarking.
>     > > >
>     > > > [1] https://issues.apache.org/jira/browse/IGNITE-5712
>     > > > [2] https://issues.apache.org/jira/browse/IGNITE-5714
>     > > > [3] https://github.com/apache/ignite/pull/2789
>     > > > [4] Section 3.2.3
>     > > >
>     > >
>     > >
>     http://download.oracle.com/otn-pub/jcp/jta-1.1-spec-oth-JSpec/jta-1_1-spec.pdf
>     >
>     > 
>
> -- 
>
> /Best Regards,/
>
> */Kuznetsov Aleksey/*
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message