ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Serge Puchnin n <sergey.puch...@gmail.com>
Subject Multi-Version Concurrency Control
Date Tue, 15 Aug 2017 13:11:17 GMT
Hello Ignite Developers, 

I’d like to start a discussion about the design of  MVCC implementation [1].
It’ll help us to solve an issue when a user might see a partially committed transaction.

Below you can find the proposed design. 
Please provide your feedback/thoughts about suggested solution. 

Thanks a lot,
Sergey Puchnin 

[1] https://issues.apache.org/jira/browse/IGNITE-3478

Multi-Version Concurrency Control Architecture


This page contains high-level description of MVCC Architecture for JIRA https://issues.apache.org/jira/browse/IGNITE-3478
Problem Description
Current Ignite SQL does not take into account transaction boundaries. For example, if a transaction
atomically changes the balance of two accounts, then a concurrent SQL query can see a partially
committed transaction. This works for data analytics but does not work for more strict and
demanding scenarios.
It would be perfect if we had a mode which ensures transaction boundaries are taken into account
for SQL query.
Design Description
Multi-Version Concurrency Control (MVCC) is a concurrency control mechanism that has been
implemented in numerous RDBMS systems.
The main approach of this solution is instead of change data in-place a session creates a
new version of data, and different transactions are able to get a correct version in concurrent
Consequently, readers never block writers, writers never block readers, readers don't need
any locks.
In cluster environment data are well distributed via different partitions, nodes and it's
necessary to support cases when read-write transactions are processed by nodes in an accidental
At this point, we have to revise the only transactions that need to update or read data on
more than one partition.
To provide cross-cache ACID transactions for all isolation levels and lock models in distributed
environment it necessary to determinate what changes were made before a transaction was started
and are visible for the transaction.
To solve that new component is provided in the system - TS coordinator and two new attributes
are added to an entity: minTS and maxTS.
The coordinator will order all changes for multi-partition transactions. 
minTS and maxTS will determine a visibility of specific version for a transaction.
A base workflow is include next steps.
1. Initialize
During initialization, a transaction asks a node that acts as a TS coordinator to get a new
transaction identifier (currentTX). The currentTX is a monotonic, unique identifier to order
all changes. 
The coordinator adds current transaction into an active transaction list in RUNNING state.

Also, the coordinator informs the transaction about all transactions were registered before
its and now are in RUNNING state (excluding TX).
2. Writing
On Prepare stage the transaction get all necessary locks.
On Commit stage the transaction:
1. To find current version in PK Index (minTS = currentTX or maxTS = 0)
2. To find current version in Data Page by a link from #1
3. Insert new version into Data Page and set minTS = currentTX, maxTS=0
4. Insert new version into PK Index set minTS = currentTX set maxTS=0 and set link to #2
5. Update current version in Data Page (find via link from #1) set maxTS = currentTX
6. Update current version in PK Index Page (find at #1) set maxTS = currentTX
7. Find current version in Secondary Index (minTS = currentTX or maxTS = 0)
8. Insert new version into Secondary Index set minTS = currentTX set maxTS=0 set link to #4
9. Update current version in Secondary Index set maxTS=currentTX
Due to Two-Phase Commit it's not possible to get a case when two not committed transactions
update the same key. 
For delete statements steps 3, 4, 8 should be skipped.
3. Reading
For a transaction with Repeatable Read, a value will be cached on Near Node.
For other levels, a transaction should find a version for which the following condition holds:
minTS >= CurrentTS and (maxTS < CurrentTS or maxTS = 0). Also minTS, maxTS should not
be in excluding TX list.
For Read Committed, CurrentTS is an identifier on start statement moment. 
For Serialization level, CurrentTS is an identifier on start session moment. In addition to
that if a session has found that maxTS !=0 an error "It's not possible to serialize the result"
should be raised.
4. Garbage collection
During a Garbage collection procedure any entity's versions with maxTS less than minimal currentTX
for sessions with RUNNING state can be deleted. There are no cases when we should show these
values. minTS for next version should be updated to 0.
5. Sample 
As a sample, there are three write operation and check changes in data structures. 


View raw message