ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aleksey Kuznetsov <alkuznetsov...@gmail.com>
Subject Re: [IGNITE-5714] Design proposal of suspend/resume for pessimistic tx
Date Wed, 28 Feb 2018 16:10:48 GMT
> What if originating(or primary, or couple of primaries) node(s) fails
when tx is suspended?

Thanks for advice, I've added a couple of failover tests. With the
following scenarios :

Started transaction, perform some put, suspend it and primary node fails.
Works as expected : tx rollbacks.
In another scenario, the originating node fails. Works as expected : tx
rollbacks.

> Do we have tests for this feature?

yeah,we do.

ср, 28 февр. 2018 г. в 14:45, Nikolay Izhikov <nizhikov@apache.org>:

> Hello, Aleksey.
>
> > Tests were added for suspending/resuming optimistic transaction earlier
> for parent task.
> > I adopted them for pessimistic mode. Also tx rollback tests were aded.
>
> I think, at least, we need new failover tests.
>
> What if originating(or primary, or couple of primaries) node(s) fails when
> tx is suspended?
>
> > Note, that if user suspends transaction and forgets to resume it,
> > transaction would be rolled back once timeout has occurred.
>
> Do we have tests for this feature?
>
> В Ср, 28/02/2018 в 10:29 +0000, Aleksey Kuznetsov пишет:
> > Thanx for the answer!
> >
> > > Can we remove thread id completely from code?
> > > Can you estimate how much effort do we need?
> >
> > I haven't removed thread id completely from the code because we tightly
> > coupled to it in some parts.
> > For instance, look at the following scenario :
> >
> > "When we suspend transaction, we must make sure only thread started
> > transaction can suspend it. So we compare thread id in
> > `GridNearTxLocal#suspend()`."
> >
> > Note that cache locks also make use of thread id, consider the following
> > scenario:
> >
> > "We create cache lock and perform put operation in the same thread, which
> > create implicit transaction. Transaction would compare both therad ids in
> > tx and cache lock and can apply the optimisation: tx would take advantage
> > of cache lock, and would not create its own lock on key."
> >
> > If we really needed to remove it completely from the code, significant
> > changes are required both to transactions and cache locks.
> >
> > It could take several weeks to completely remove thread id.
> >
> > > As far as I can see from parent task [1] we need some complex tests to
> be
> >
> > implemented.
> > > Are they presented in prototype?
> >
> > Tests were added for suspending/resuming optimistic transaction earlier
> for
> > parent task.
> > I adopted them for pessimistic mode.
> > Also tx rollback tests were aded.  They check whether timeout metrics are
> > correctly calculated for suspended transactions. You can find them in
> > `GridCacheTransactionalAbstractMetricsSelfTest`.
> >
> > Please, feel free to propose additional tests for the ticket.
> >
> >
> > вт, 27 февр. 2018 г. в 17:27, Nikolay Izhikov <nizhikov@apache.org>:
> >
> > > Hello, Alexey.
> > >
> > > Great mail, by the way.
> > > I think, it would be great to have this feature in Ignite.
> > >
> > > > I haven't removed thread id completely from code.
> > >
> > > Can we remove thread id completely from code?
> > > Can you estimate how much effort do we need?
> > >
> > > As far as I can see from parent task [1] we need some complex tests to
> be
> > > implemented.
> > > Are they presented in prototype?
> > >
> > > [1]
> > >
> https://issues.apache.org/jira/browse/IGNITE-4887?focusedCommentId=16069655&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16069655
> > >
> > > В Пн, 26/02/2018 в 10:59 +0000, Aleksey Kuznetsov пишет:
> > > > Hello Igniters!
> > > >
> > > > Currently we have suspension/resuming implemented for optimistic
> > > > transactions [1].
> > > > Unless suspend/resume isn't supported for pessimistic tx JTA isn't
> fully
> > > > supported [4].
> > > >
> > > > I’m working on a ticket "Suspension/resuming for pessimistic
> > >
> > > transactions"
> > > > [2].
> > > > Goal of the ticket is to support transaction suspend/resume for
> > >
> > > pessimistic
> > > > transactions.
> > > >
> > > > # Benefits of suspending/resuming transaction.
> > > >
> > > >   1. Full JTA standart support.
> > > >   2. Increase of throughput in high-load scenarios.
> > > >       Suspend operation would allow to release Ignite threads and
> > > > optionally perform some other logic.
> > > >
> > > > Note, current API has got suspend/resume only for optimistic
> > >
> > > transactions,
> > > > which confuses users.
> > > >
> > > > # Real life example.
> > > >
> > > > Consider the following scenario:
> > > >
> > > >   1. Application starts Ignite transaction.
> > > >   2. Business logic is executed inside transaction.
> > > >   3. For commit/rollback application need approval message from
> external
> > > > agent.
> > > >   4. Currently, thread inside Ignite is idle until approval is
> received.
> > > >   4a. When suspend/resume support is implemented, application can
> perform
> > > > suspend and release thread inside Ignite.
> > > >
> > > > # How pessimistic transaction works.
> > > >
> > > > When we perform put/get operations in pessimistic transactions, lock
> > > > request is sent to remote nodes by `GridNearLockRequest`.
> > > > Request contains thread id `IgniteTxAdapter#threadId`, in which
> operation
> > > > was performed.
> > > > In pessimistic mode, multiple transaction objects are created - on
> > > > primary, on backup nodes, and on originating node:
> > > > `GridNearTxLocal`, `GridDhtTxLocal`, `GridNearTxRemote`,
> > >
> > > `GridDhtTxRemote`.
> > > >
> > > > Thread id is used in logic on these nodes.
> > > > For instance, to check whether thread has successfully locked the
> key,
> > > > after lock acquisition attempt.
> > > > Or to check whether active transaction exists.
> > > >
> > > > # Main challenge for implementation.
> > > >
> > > > I've analysed implementation approaches and see the core issue:
> > > >
> > > > The essential problem with suspending/resuming lies in thread id
> field
> > > > transferred to remote nodes during put/get operations.
> > > >
> > > > Imagine, we want to suspend transaction and resume it in another
> thread.
> > > > See code snippet below:
> > > >
> > > > ```
> > > > tx = ignite.transactions().txStart(PESSIMISTIC, SERIALIZABLE);
> > > >
> > > > cache.put(1, 1);
> > > >
> > > > tx.suspend();
> > > > ....
> > > >
> > > > // In another thread.
> > > > tx.resume(); // Thread id will be changed in transaction instance.
> > > > ```
> > > >
> > > > Original thread id is transferred and saved on remote node.
> > > > After resuming thread id on local node differs from remote node.
> > > > I want to avoid one more network round trip to change thread Id on
> remote
> > > > node after transaction resuming.
> > > >
> > > > # Design proposal.
> > > >
> > > > Transaction id (`xid`) can be used instead of thread id on remote
> nodes.
> > > > The following solution is possible for the ticket :
> > > >
> > > > Replace thread id by transaction id for sending to remote nodes.
> > > > Thread id will be removed from the following classes:
> > > > `IgniteTxAdapter`, `GridDistributedTxPrepareRequest`,
> > > > `GridDistributedTxFinishRequest`, `GridDistributedTxFinishResponse`,
> > > > `GridDistributedTxPrepareRequest`.
> > > >
> > > > I haven't removed thread id completely from code. Thread id is moved
> to
> > > > `GridNearTxLocal`.
> > > > We still need it in near local transaction for many reasons, for
> example
> > >
> > > to
> > > > assure only thread started transaction can suspend it in
> > > > `GridNearTxLocal#suspend()`.
> > > > In future we can remove thread id completely. I propose to study this
> > > > question in another ticket.
> > > >
> > > > Also, thread id is remained in `GridDistributedLockRequest`.
> > > > Lock request used by cache locks and it need to transfer thread id to
> > > > remote nodes.
> > > > For example to use cache locks along with cache operations put/get,
> see
> > > > `GridNearTxLocal#updateExplicitVersion`.
> > > > As for pessimistic transaction, thread id in
> `GridDistributedLockRequest`
> > > > is set to `UNDEFINED_THREAD_ID`, which means we must not use it
> remotely.
> > > >
> > > > Note, that if user suspends transaction and forgets to resume it,
> > > > transaction would be rolled back once timeout has occurred.
> > > >
> > > > In my design when transaction is suspended, all locked keys remain
> > >
> > > locked.
> > > >
> > > > Please see my prototype of proposal implementation [3].
> > > >
> > > > Proposed changes are relatively small.
> > > > They ensure consistency of information about locks, if thread Id
> will be
> > > > changed within one transaction (by suspend/resume).
> > > > There will be used correct id for locks on remote nodes. It also
> requires
> > > > painstaking work, but changes will not affect the logic of oher
> > >
> > > components.
> > > >
> > > > Tell me please what do you think? Any suggestions and comments will
> be
> > > > helpful.
> > > >
> > > > If you agree with my design I also will do benchmarking.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-5712
> > > > [2] https://issues.apache.org/jira/browse/IGNITE-5714
> > > > [3] https://github.com/apache/ignite/pull/2789
> > > > [4] Section 3.2.3
> > > >
> > >
> > >
> http://download.oracle.com/otn-pub/jcp/jta-1.1-spec-oth-JSpec/jta-1_1-spec.pdf
> >
> >

-- 

*Best Regards,*

*Kuznetsov Aleksey*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message