directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kiran Ayyagari <>
Subject Re: [Mavibot] Transaction support
Date Fri, 30 Oct 2015 03:51:10 GMT
On Thu, Oct 29, 2015 at 6:13 PM, Emmanuel L├ęcharny <>

> Hi guys,
> here are a bit of my last night's insomnia thoughts (with the shower,
> and the smallest room of my appartment, this is the only place and time
> I have for such thoughts atm ;-) :
> - txn are now going to be explicit, ie you have to begin a txn and
> commit it before and after any update, like in :
>         // Inject some data
>         recordManager.beginTransaction();
>         btree.insert( 1L, "1" );
>         btree.insert( 4L, "4" );
>         btree.insert( 2L, "2" );
>         btree.insert( 3L, "3" );
>         btree.insert( 5L, "5" );
>         recordManager.commit();
if a user forgets to call commit() and keeps inserting we will run out of
memory at some point
so wondering what is the better way to prevent that from happening without
maintaining a
log (a WAL or something similar)

> - later on, we may include the txn begin/commit into the update
> operation, on a btree basis.
> - the direct consequence is that we don't write any thing on the disk
> before we commit. This will have a huge impact on performances as every
> single update operation involve many internal updates : BOB Btree, CPB
> Btree, RMH. Typically, in the previous example, we will save 5 updates
> of teh BOB and CPB, plus numerous updates on the RMH, not counting the 4
> updates of the btree root page and btree header.
> - of course, this is not coming for free : if one forget to commit or
> rollback the txn, then the pages will remain in memory...
> - in order to manipulate those pages in memory, we will keep a track of
> the update pages and btree headers in memory. We use Map for that :
>     Map<Long, PageIO> and Map<Long, Page>
> - now, we have two kind of map : one that keep a track on PageIO and the
> other one that keep a track on Page. Why is that ? Because the RMH and
> the BHs are not holding any value, they are specific data structures.
> Every other Page hold  Keys/Values, where value sare data or pointer to
> a child Page.
> - we need a way to reference those Page and PageIO. For PageIO, this is
> easy : we use the page offset. For plain Page, this is a different
> story, because we may create new pages (actually, this *is* what we do
> for every update, except that as we work in memory during a txn, we can
> *modify* an existing page instead of copy it, until we need to write it
> down on disk. Note that the very first time we access to a Page, either
> from teh disk or from the cache, we have to copy it, no matter what).
> - Those new pages don't have an offset, because we haven't written them
> to disk yet. The reason is that writing such a page to the disk means we
> have serialized the page - an expensive operation we want to avoid until
> we have to do it, ie when we write the page to the disk -, and we have
> fetched as many PageIO as necessary to write the page to the disk
> (fetching a new PageIO will first look in the Free Page list or grab new
> PageIO at the end of the file, expending the file).
> - So we have to find a way to identify those pages by other means. One
> idea is to use an incremental ID associated with the txn. Every Page
> will have a Long ID.
> - now, the pb is that we use offset in pages to reference child pages.
> We must be able to distinguish an offset from an ID. We will use a
> negative long for PageID.
> - when we commit the txn, we will have to update all the references to
> IDs to replace them with offsets. At this point, the easiest way is to
> read all the Nodes we keep in the Map, and check all of their values to
> see if we are refering to an ID instead of an Offset, and change that.
> - that drives the order in which we write the pages : we have to start
> from the leaves, and up to the root page
> - another approach would be to keep a track of the modified values for
> each Node (ie, value X has been change is now reffer to page Y, etc...).
> This will be memory consuming, but CPU efficient. All in all, it's
> another Map to handle : Map<PageID, Set<Modified Value position>>. This
> approach is probably what we want to use...
> - modifying a page instead of copying it requires we add new methods in
> the Btree class. Not a huge change.
> - a roolback is just a matter of deleting the txn context: straightforward
> !
> At this point, I think we have a roadmap for txn support. I said nothing
> about managing free pages here, it has already been discussed in another
> thread, but I'll come later with a more explicit mail.
> I'm curently working in a branch, in which I have removed the InMemory
> Btrees and the support for multiple values. The branch has been created,
> but I haven't commit yet anything in it : I have too many broken things.
> I'll commit asap (may be this week-end if I have time to work on my free
> time - familly comes first !)
please commit it even if it is broken, I might be able to understand the
above ideas
if I read it again after looking at code

thanks Emmanuel

> Thanks !

> BOB : Btree-Of-Btrees, the Btree that hold the revisions in use
> CPB : Copied-Pages-Btree, the btree that holds teh list of pages that
> have been copied during an update
> RMH : RecordManager Header, the data structure that maintains the
> pointers to the BOB and CPB, both the current and previous, plus some
> extra informations, like the FreeList page
> BH :  Btree Header, the structure that stors meta-information about a
> B-tree (serializers, comparators, nb elements, pointer to the root page,
> etc)

Kiran Ayyagari

View raw message