ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikolay Izhikov <nizhi...@apache.org>
Subject Re: [IGNITE-5714] Design proposal of suspend/resume for pessimistic tx
Date Wed, 28 Feb 2018 11:45:22 GMT
Hello, Aleksey.

> Tests were added for suspending/resuming optimistic transaction earlier for parent task.
> I adopted them for pessimistic mode. Also tx rollback tests were aded.

I think, at least, we need new failover tests.

What if originating(or primary, or couple of primaries) node(s) fails when tx is suspended?

> Note, that if user suspends transaction and forgets to resume it,
> transaction would be rolled back once timeout has occurred.

Do we have tests for this feature?

В Ср, 28/02/2018 в 10:29 +0000, Aleksey Kuznetsov пишет:
> Thanx for the answer!
> 
> > Can we remove thread id completely from code?
> > Can you estimate how much effort do we need?
> 
> I haven't removed thread id completely from the code because we tightly
> coupled to it in some parts.
> For instance, look at the following scenario :
> 
> "When we suspend transaction, we must make sure only thread started
> transaction can suspend it. So we compare thread id in
> `GridNearTxLocal#suspend()`."
> 
> Note that cache locks also make use of thread id, consider the following
> scenario:
> 
> "We create cache lock and perform put operation in the same thread, which
> create implicit transaction. Transaction would compare both therad ids in
> tx and cache lock and can apply the optimisation: tx would take advantage
> of cache lock, and would not create its own lock on key."
> 
> If we really needed to remove it completely from the code, significant
> changes are required both to transactions and cache locks.
> 
> It could take several weeks to completely remove thread id.
> 
> > As far as I can see from parent task [1] we need some complex tests to be
> 
> implemented.
> > Are they presented in prototype?
> 
> Tests were added for suspending/resuming optimistic transaction earlier for
> parent task.
> I adopted them for pessimistic mode.
> Also tx rollback tests were aded.  They check whether timeout metrics are
> correctly calculated for suspended transactions. You can find them in
> `GridCacheTransactionalAbstractMetricsSelfTest`.
> 
> Please, feel free to propose additional tests for the ticket.
> 
> 
> вт, 27 февр. 2018 г. в 17:27, Nikolay Izhikov <nizhikov@apache.org>:
> 
> > Hello, Alexey.
> > 
> > Great mail, by the way.
> > I think, it would be great to have this feature in Ignite.
> > 
> > > I haven't removed thread id completely from code.
> > 
> > Can we remove thread id completely from code?
> > Can you estimate how much effort do we need?
> > 
> > As far as I can see from parent task [1] we need some complex tests to be
> > implemented.
> > Are they presented in prototype?
> > 
> > [1]
> > https://issues.apache.org/jira/browse/IGNITE-4887?focusedCommentId=16069655&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16069655
> > 
> > В Пн, 26/02/2018 в 10:59 +0000, Aleksey Kuznetsov пишет:
> > > Hello Igniters!
> > > 
> > > Currently we have suspension/resuming implemented for optimistic
> > > transactions [1].
> > > Unless suspend/resume isn't supported for pessimistic tx JTA isn't fully
> > > supported [4].
> > > 
> > > I’m working on a ticket "Suspension/resuming for pessimistic
> > 
> > transactions"
> > > [2].
> > > Goal of the ticket is to support transaction suspend/resume for
> > 
> > pessimistic
> > > transactions.
> > > 
> > > # Benefits of suspending/resuming transaction.
> > > 
> > >   1. Full JTA standart support.
> > >   2. Increase of throughput in high-load scenarios.
> > >       Suspend operation would allow to release Ignite threads and
> > > optionally perform some other logic.
> > > 
> > > Note, current API has got suspend/resume only for optimistic
> > 
> > transactions,
> > > which confuses users.
> > > 
> > > # Real life example.
> > > 
> > > Consider the following scenario:
> > > 
> > >   1. Application starts Ignite transaction.
> > >   2. Business logic is executed inside transaction.
> > >   3. For commit/rollback application need approval message from external
> > > agent.
> > >   4. Currently, thread inside Ignite is idle until approval is received.
> > >   4a. When suspend/resume support is implemented, application can perform
> > > suspend and release thread inside Ignite.
> > > 
> > > # How pessimistic transaction works.
> > > 
> > > When we perform put/get operations in pessimistic transactions, lock
> > > request is sent to remote nodes by `GridNearLockRequest`.
> > > Request contains thread id `IgniteTxAdapter#threadId`, in which operation
> > > was performed.
> > > In pessimistic mode, multiple transaction objects are created - on
> > > primary, on backup nodes, and on originating node:
> > > `GridNearTxLocal`, `GridDhtTxLocal`, `GridNearTxRemote`,
> > 
> > `GridDhtTxRemote`.
> > > 
> > > Thread id is used in logic on these nodes.
> > > For instance, to check whether thread has successfully locked the key,
> > > after lock acquisition attempt.
> > > Or to check whether active transaction exists.
> > > 
> > > # Main challenge for implementation.
> > > 
> > > I've analysed implementation approaches and see the core issue:
> > > 
> > > The essential problem with suspending/resuming lies in thread id field
> > > transferred to remote nodes during put/get operations.
> > > 
> > > Imagine, we want to suspend transaction and resume it in another thread.
> > > See code snippet below:
> > > 
> > > ```
> > > tx = ignite.transactions().txStart(PESSIMISTIC, SERIALIZABLE);
> > > 
> > > cache.put(1, 1);
> > > 
> > > tx.suspend();
> > > ....
> > > 
> > > // In another thread.
> > > tx.resume(); // Thread id will be changed in transaction instance.
> > > ```
> > > 
> > > Original thread id is transferred and saved on remote node.
> > > After resuming thread id on local node differs from remote node.
> > > I want to avoid one more network round trip to change thread Id on remote
> > > node after transaction resuming.
> > > 
> > > # Design proposal.
> > > 
> > > Transaction id (`xid`) can be used instead of thread id on remote nodes.
> > > The following solution is possible for the ticket :
> > > 
> > > Replace thread id by transaction id for sending to remote nodes.
> > > Thread id will be removed from the following classes:
> > > `IgniteTxAdapter`, `GridDistributedTxPrepareRequest`,
> > > `GridDistributedTxFinishRequest`, `GridDistributedTxFinishResponse`,
> > > `GridDistributedTxPrepareRequest`.
> > > 
> > > I haven't removed thread id completely from code. Thread id is moved to
> > > `GridNearTxLocal`.
> > > We still need it in near local transaction for many reasons, for example
> > 
> > to
> > > assure only thread started transaction can suspend it in
> > > `GridNearTxLocal#suspend()`.
> > > In future we can remove thread id completely. I propose to study this
> > > question in another ticket.
> > > 
> > > Also, thread id is remained in `GridDistributedLockRequest`.
> > > Lock request used by cache locks and it need to transfer thread id to
> > > remote nodes.
> > > For example to use cache locks along with cache operations put/get, see
> > > `GridNearTxLocal#updateExplicitVersion`.
> > > As for pessimistic transaction, thread id in `GridDistributedLockRequest`
> > > is set to `UNDEFINED_THREAD_ID`, which means we must not use it remotely.
> > > 
> > > Note, that if user suspends transaction and forgets to resume it,
> > > transaction would be rolled back once timeout has occurred.
> > > 
> > > In my design when transaction is suspended, all locked keys remain
> > 
> > locked.
> > > 
> > > Please see my prototype of proposal implementation [3].
> > > 
> > > Proposed changes are relatively small.
> > > They ensure consistency of information about locks, if thread Id will be
> > > changed within one transaction (by suspend/resume).
> > > There will be used correct id for locks on remote nodes. It also requires
> > > painstaking work, but changes will not affect the logic of oher
> > 
> > components.
> > > 
> > > Tell me please what do you think? Any suggestions and comments will be
> > > helpful.
> > > 
> > > If you agree with my design I also will do benchmarking.
> > > 
> > > [1] https://issues.apache.org/jira/browse/IGNITE-5712
> > > [2] https://issues.apache.org/jira/browse/IGNITE-5714
> > > [3] https://github.com/apache/ignite/pull/2789
> > > [4] Section 3.2.3
> > > 
> > 
> > http://download.oracle.com/otn-pub/jcp/jta-1.1-spec-oth-JSpec/jta-1_1-spec.pdf
> 
> 
Mime
View raw message