ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roman Kondakov (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (IGNITE-9592) MVCC: Use linked lists to store multiple versions.
Date Fri, 14 Sep 2018 08:46:00 GMT

     [ https://issues.apache.org/jira/browse/IGNITE-9592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Roman Kondakov updated IGNITE-9592:
-----------------------------------
    Description: 
Currently we store all versions of each row in primary index. It is not very efficient for
several reasons:
 * We have to insert and remove tree item each time a new version is created or an old version
deleted. This leads to a considerable tree operations number increasing as well as extra tree
splits and merges operations.
 * Also this leads to a contention on leaf page write lock - each update operation has
to obtain exclusive access to insert a new version or remove old version of row. During this
update no body on that leaf can not be able to update or even read data of neighbour keys.
 * Primary key tree consumes more space if it stores each version.
 * Other vendors do not store each version in primary indexes (as far as I know).

Possible solution - store only key and link to the newest version in the primary index. Instead
of this {{CacheDataTree}} item
 {{|           key part          |           |        
|}}
 {{|-----------------------------|  lockVer  |   link  |}}
 {{| cache_id | hash |  mvccVer  |           |         |}}
  we'll have:
 {{|   key part      |           |  link to the   |}}
 {{|-----------------|  lockVer  |     newest     |}}
 {{| cache_id | hash |           |    version     |}}
 Note, we do not have mvccVer in tree item. Link to the newer version leads to the most recent
inserted data item. To find older versions, each DatRow is provided with "prevLink" - the
link to previous version of row. DataRow layout can be changed from
|header|xid_max|xid_min |KV bytes|

to the next one:
|header|xid_max|xid_min|*PREV_LINK*|KV bytes|

Where *PREV_LINK* field points to the previous version of the row.  Traversing this prevRow
links we can iterate over all available versions without affecting primary key tree.

When the new version is created we just insert the new row in datastore, then do CAS on the
link to the newest version in primary key tree in order it points to the new row. PrevLink
in the new row should point on the previous one. That is all. We've just inserted a new version
just with a long field CAS in the CacheDataTree. Without obtaining write lock and other headache.

Secondary indexes are handled in the same manner as before.

  was:
Currently we store all versions of each row in primary index. It is not very efficient for
several reasons:
 * We have to insert and remove tree item each time a new version is created or an old version
deleted. This leads to a considerable tree operations number increasing as well as tree splits
and merges operations.
 * Also this leads to a contention on leaf page write lock - each update operation has
to obtain exclusive access to insert a new version of row. During this update no body on that
leaf can not be able to update or even read data of neighbour keys.
 * Primary key tree consumes more space if it stores each version.
 * Other vendors do not store each version in primary indexes (as far as I know).

Possible solution - store only key and link to the newest version in the primary index. Instead
of this {{CacheDataTree}} item
{{|           key part          |           |        
|}}
{{|-----------------------------|  lockVer  |   link  |}}
{{| cache_id | hash |  mvccVer  |           |         |}}
 we'll have:
{{|   key part      |           |  link to the   |}}
{{|-----------------|  lockVer  |     newest     |}}
{{| cache_id | hash |           |    version     |}}
Note, we do not have mvccVer in tree item. Link to the newer version leads to the most recent
inserted data item. To find older versions, each DatRow is provided with "prevLink" - the
link to previous version of row. DataRow layout can be changed from

| header | xid_max | xid_min | KV bytes |

to the next one:

| header | xid_max | xid_min | *PREV_LINK* | KV bytes |

Where *PREV_LINK* field points to the previous version of the row.  Traversing this prevRow
links we can iterate over all available versions without affecting primary key tree.

When the new version is created we just insert the new row in datastore, then do CAS on the
link to the newest version in primary key tree in order it points to the new row. PrevLink
in the new row should point on the previous one. That is all. We've just inserted a new version
just with a long field CAS in the CacheDataTree. Without obtaining write lock and other headache.

Secondary indexes are handled in the same manner as before.


> MVCC: Use linked lists to store multiple versions.
> --------------------------------------------------
>
>                 Key: IGNITE-9592
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9592
>             Project: Ignite
>          Issue Type: Improvement
>          Components: mvcc
>            Reporter: Roman Kondakov
>            Priority: Major
>              Labels: performance
>
> Currently we store all versions of each row in primary index. It is not very efficient
for several reasons:
>  * We have to insert and remove tree item each time a new version is created or an old
version deleted. This leads to a considerable tree operations number increasing as well as
extra tree splits and merges operations.
>  * Also this leads to a contention on leaf page write lock - each update operation
has to obtain exclusive access to insert a new version or remove old version of row. During
this update no body on that leaf can not be able to update or even read data of neighbour
keys.
>  * Primary key tree consumes more space if it stores each version.
>  * Other vendors do not store each version in primary indexes (as far as I know).
> Possible solution - store only key and link to the newest version in the primary index.
Instead of this {{CacheDataTree}} item
>  {{|           key part          |           |        
|}}
>  {{|-----------------------------|  lockVer  |   link  |}}
>  {{| cache_id | hash |  mvccVer  |           |         |}}
>   we'll have:
>  {{|   key part      |           |  link to the   |}}
>  {{|-----------------|  lockVer  |     newest     |}}
>  {{| cache_id | hash |           |    version     |}}
>  Note, we do not have mvccVer in tree item. Link to the newer version leads to the most
recent inserted data item. To find older versions, each DatRow is provided with "prevLink"
- the link to previous version of row. DataRow layout can be changed from
> |header|xid_max|xid_min |KV bytes|
> to the next one:
> |header|xid_max|xid_min|*PREV_LINK*|KV bytes|
> Where *PREV_LINK* field points to the previous version of the row.  Traversing this
prevRow links we can iterate over all available versions without affecting primary key tree.
> When the new version is created we just insert the new row in datastore, then do CAS
on the link to the newest version in primary key tree in order it points to the new row. PrevLink
in the new row should point on the previous one. That is all. We've just inserted a new version
just with a long field CAS in the CacheDataTree. Without obtaining write lock and other headache.
> Secondary indexes are handled in the same manner as before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message