ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Ozerov <voze...@gridgain.com>
Subject Re: Rework locking architecture for MVCC and transactional SQL
Date Thu, 14 Dec 2017 07:21:39 GMT
Denis,

Sorry, may be I was not clear enough - "tuple-approach" and "persistent
approach" are the same. By "tuple" I mean a row stored inside a data block.
Currently we store lock information in Java heap and proposal is to move it
to data blocks. The main driver is memory - if there are a rows to be
locked we will either run out of memory, or produce serious memory
pressure. For example, currently update of 1M entries will consume ~500Mb
of heap. With proposed approach it will consume almost nothing. The
drawback is increased number of dirty data pages, but it should not be a
problem because in final implementation we will update data rows before
prepare phase anyway, so I do not expect any write amplification in usual
case.

This approach is only applicable for Ignite persistence.

On Thu, Dec 14, 2017 at 1:53 AM, Denis Magda <dmagda@apache.org> wrote:

> Vladimir,
>
> Thanks for a throughout overview and proposal.
>
> > Also we could try employing tiered approach
> > 1) Try to keep everything in-memory to minimize writes to blocks
> > 2) Fallback to persistent lock data if certain threshold is reached.
>
> What are the benefits of the backed-by-persistence approach in compare to
> the one based on tuples? Specifically:
> - will the persistence approach work for both 3rd party and Ignite
> persistence?
> - any performance impacts depending on a chosen method?
> - what’s faster to implement?
>
> —
> Denis
>
> > On Dec 13, 2017, at 2:10 AM, Vladimir Ozerov <vozerov@gridgain.com>
> wrote:
> >
> > Igniters,
> >
> > As you probably we know we work actively on MVCC [1] and transactional
> SQL
> > [2] features which could be treated as a single huge improvement. We
> face a
> > number of challenges and one of them is locking.
> >
> > At the moment information about all locks is kept in memory on per-entry
> > basis (see GridCacheMvccManager). For every locked key we maintain
> current
> > lock owner (XID) and the list of would-be-owner transactions. When
> > transaction is about to lock an entry two scenarios are possible:
> > 1) If entry is not locked we obtain the lock immediately
> > 2) if entry is locked we add current transaction to the wait list and
> jumps
> > to the next entry to be locked. Once the first entry is released by
> > conflicting transaction, current transaction becomes an owner of the
> first
> > entry and tries to promote itself for subsequent entries.
> >
> > Once all required locks are obtained, response is sent to the caller.
> >
> > This approach doesn't work well for transactional SQL - if we update
> > millions of rows in a single transaction we will simply run out of
> memory.
> > To mitigate the problem other database vendors keep information about
> locks
> > inside the tuples. I propose to apply the similar design as follows:
> >
> > 1) No per-entry lock information is stored in memory anymore.
> > 2) The list of active transactions are maintained in memory still
> > 3) When TX locks an entry, it sets special marker to the tuple [3]
> > 4) When TX meets already locked entry, it enlists itself to wait queue of
> > conflicting transaction and suspends
> > 5) When first transaction releases conflicting lock, it notifies and
> wakes
> > up suspended transactions, so they resume locking
> > 6) Entry lock data is cleared on transaction commit
> > 7) Entry lock data is not cleared on rollback or node restart; Instead,
> we
> > will could use active transactions list to identify invalid locks and
> > overwrite them as needed.
> >
> > Also we could try employing tiered approach
> > 1) Try to keep everything in-memory to minimize writes to blocks
> > 2) Fallback to persistent lock data if certain threshold is reached.
> >
> > Thoughts?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-3478
> > [2] https://issues.apache.org/jira/browse/IGNITE-4191
> > [3] Depends on final MVCC design - it could be per-tuple XID, undo
> vectors,
> > per-block transaction lists, etc..
> >
> > Vladimir.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message