ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Ozerov (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (IGNITE-11433) MVCC: Link entry versions at the Data Store layer.
Date Mon, 11 Mar 2019 07:37:00 GMT

     [ https://issues.apache.org/jira/browse/IGNITE-11433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vladimir Ozerov updated IGNITE-11433:
-------------------------------------
    Issue Type: Task  (was: Improvement)

> MVCC: Link entry versions at the Data Store layer.
> --------------------------------------------------
>
>                 Key: IGNITE-11433
>                 URL: https://issues.apache.org/jira/browse/IGNITE-11433
>             Project: Ignite
>          Issue Type: Task
>          Components: mvcc, sql
>            Reporter: Igor Seliverstov
>            Priority: Major
>
> At now all tuple versions are placed inside index trees. CacheDataTree is used to link
versions each to other (using their order inside an index page).
> Despite the fact that this approach is easy to implement and preferable at the first
point, it brings several disadvantages:
> 1) We need to iterate over tuple versions at update time under a read (or even write)
lock on an index page which blocks other write (read) operations for a relatively long period
of time.
> 2) Write amplification suffers not only Data Store layer, but indexes as well, which
makes read/lookup ops into indexes much slower.
> 3) We cannot implement several important improvements (data streamer optimizations) because
having several versions of one key in an index page doesn't allow using of Invoke operations.
> Using versions linking at the Data Store only (like it do other vendors) solves or decreases
impact of that issues.
> So, the proposed changes:
> 1) Change data page layout adding two fields into its header: {{link}} (a link to the
next tuple in a versions chain) and {{lock}} (a tx, which holds a write lock on the HEAD of
the chain) There are several possible optimizations: 1) leave lock as is (in the cache index
item) 2) use max version as lock version as well
> 2) Do not save all versions of a tuple in indexes; this mean removing version from key
- newest version will overwrite an existing entry
> There are two approaches with some pros and cons of how to link versions:
> 1) N2O (newer to older) - a reader (writer) gets the newest tuple version first and iterates
over tuple versions from newer to older until it gets a position where it's snapshot placed
between min and max versions of the examined tuple. Approach implies faster reads (more actual
versions are get first) and necessity of updating all involved indexes on each write operation
- slower writes in other words (may be optimized using logical pointers to the head of tuple
versions chain). Cooperative VAC (update operations remove invisible for all readers tuple
versions) is possible.
> 2) O2N (older to newer) - a reader gets the oldest visible tuple version and iterates
over versions until it gets visible version. It allows not to update all indexes (except the
case when an index value is changed), write operations become lighter. Cooperative VAC almost
impossible.
> We need to decide which approach to use depending on that load profile is preferable
(OLTP/OLAP)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message