directory-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From build...@apache.org
Subject svn commit: r891500 - in /websites/staging/directory/trunk/content: ./ mavibot/user-guide/1.1-btree-basics.html mavibot/user-guide/2-btree-types.html
Date Sun, 22 Dec 2013 22:29:41 GMT
Author: buildbot
Date: Sun Dec 22 22:29:41 2013
New Revision: 891500

Log:
Staging update by buildbot for directory

Modified:
    websites/staging/directory/trunk/content/   (props changed)
    websites/staging/directory/trunk/content/mavibot/user-guide/1.1-btree-basics.html
    websites/staging/directory/trunk/content/mavibot/user-guide/2-btree-types.html

Propchange: websites/staging/directory/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Sun Dec 22 22:29:41 2013
@@ -1 +1 @@
-1552893
+1553042

Modified: websites/staging/directory/trunk/content/mavibot/user-guide/1.1-btree-basics.html
==============================================================================
--- websites/staging/directory/trunk/content/mavibot/user-guide/1.1-btree-basics.html (original)
+++ websites/staging/directory/trunk/content/mavibot/user-guide/1.1-btree-basics.html Sun
Dec 22 22:29:41 2013
@@ -152,16 +152,19 @@
 
 
 <h1 id="11-b-tree-basics">1.1 - B-tree Basics</h1>
-<p>A <strong>B-tree</strong> "tree data structure that keeps data sorted
and allows searches, sequential access, insertions, and deletions in logarithmic time." (see
<a href="http://en.wikipedia.org/wiki/B-tree">Wikipedia</a>)</p>
-<p>The important point here is the last one : it guarantees <strong>O(logn)</strong>
operations, compared to any other data structures (a hashed data structure offers <strong>O(n)</strong>
average operations, but can degenerate to <strong>O(n2)</strong>, and ordering
is not kept. </p>
+<p><strong>B-tree</strong> was invented back in 1971 by <strong>Bayer</strong>
and <strong>McCreight</strong>  (the <strong>B</strong> does not main
anything know, speculations are that it comes either form the <strong>B</strong>
of <strong>Bayer</strong> or from <strong>Boing</strong> they were
working for back then). It was an extension to binary search tree, which was just able to
store 2 elements per page.</p>
+<p>A <strong>B-tree</strong> is a "tree data structure that keeps data
sorted and allows searches, sequential access, insertions, and deletions in logarithmic time."
(see <a href="http://en.wikipedia.org/wiki/B-tree">Wikipedia</a>)</p>
+<p>The important point here is the last one : it guarantees <strong>O(logn)</strong>
operations, compared to any other data structures (a hashed data structure offers <strong>O(n)</strong>
average operations, but can degenerate to <strong>O(n2)</strong>, and ordering
is not kept. But there is more : by gathering N elements in a page, which leads to  more comparisons
than necessary, but still save a lot of disk accesses, as those comparison will be done in
memory when a page is fully loaded. That's the reason for <strong>B-trees</strong>
to have been invented. </p>
 <p><strong>B-trees</strong> are everywhere : databases, OS, etc. It's a
critical data structure when you are to deal with a huge number of data.</p>
 <p>1.1.1 - Inside a B-tree</p>
-<p>A <strong>B-Tree</strong> contains <strong>Nodes</strong>
and <strong>Leaves</strong>. A <em>Node</em> points to other <strong>Nodes</strong>
or <strong>Leaves</strong>. <strong>Leaves</strong> contains <strong>Values</strong>.
Both <strong>Nodes</strong> and <strong>Leaves</strong> have <strong>Keys</strong>
that are associated with <em>Values</em>.</p>
+<p>A <strong>B-Tree</strong> contains <strong>Nodes</strong>
and <strong>Leaves</strong>. A <em>Node</em> points to other <strong>Nodes</strong>
or <strong>Leaves</strong>. <strong>Leaves</strong> contains <strong>Values</strong>.
Both <strong>Nodes</strong> and <strong>Leaves</strong> have <strong>Keys</strong>
that are associated with <em>Values</em>. <strong>Keys</strong> are
not copied in many pages, they just appear once in the whole <strong>B-tree</strong>.</p>
 <p>Pretty simple !</p>
 <p>One last thing : <strong>Keys</strong> are ordered, and this is the
condition for the easy and fast retrieval of <strong>Values</strong>.</p>
 <p>A few more rules are enforced :
 <em> A <strong>Node</strong> and a <strong>Leaf</strong> contains
up to N values (N being generally a power of 2, so 2, 4, 8, 16...).
 </em> You can't have less than N/2 <strong>Values</strong> or <strong>keys</strong>
in a <strong>Leaf</strong> or a <strong>Node</strong>, except for
the root <strong>Node</strong>.</p>
+<p>1.1.2 - Concurrent access</p>
+<p>The real problem with <strong>B-trees</strong> is that we need to use
some locks to allow concurrent access to the data, especially when some updates occur. The
rational is that when browsing a <strong>B-tree</strong>, changing it will just
break the browse operation. There are some technics to avoid having such an issue, up to a
point no read locks are needed, assuming some extra information is added to the pages : a
pointer to the next page at the same level. Sadly, it comes with drawbacks, like it's difficult
to remove empty pages from the <strong>B-tree</strong>.</p>
 
 
     <div class="nav">

Modified: websites/staging/directory/trunk/content/mavibot/user-guide/2-btree-types.html
==============================================================================
--- websites/staging/directory/trunk/content/mavibot/user-guide/2-btree-types.html (original)
+++ websites/staging/directory/trunk/content/mavibot/user-guide/2-btree-types.html Sun Dec
22 22:29:41 2013
@@ -154,10 +154,23 @@
 <h1 id="2-b-tree-flavors">2 - B-tree flavors</h1>
 <p>You have many different flavors of <strong>B-trees</strong> :</p>
 <ul>
-<li>B+tree : we have a pointer to the next <strong>Node</strong> in each
<strong>Node</strong></li>
-<li>B*-tree : the internal <strong>Nodes</strong> are compacted, to contain
at least 2/3 of the number of possible slots</li>
-<li>Counted B*-tree : the <strong>B-tree</strong></li>
+<li>B+tree</li>
+<li>B*-tree</li>
+<li>Counted B-tree</li>
+<li>MVCC B-tree</li>
 </ul>
+<h2 id="21-btree">2.1 - B+tree</h2>
+<p>This is a <strong>B-tree</strong> which does not store values in the
<strong>Nodes</strong>, and a link between <strong>leaves</strong>
is added, to fasten the browsing : no need to go up to the parent's node to get the next value
when reaching the end of a leaf. Also the <strong>nodes</strong> don't contain
value anymore.</p>
+<h2 id="22-btree">2.2 - B*tree</h2>
+<p>A slightly different form of <strong>B+tree</strong>, where the number
of element per page is enforced to be at leat 2/3rd of the maximum numbers of elements the
page can hold. It fasten a bit the retrieval of elements, as we have a denser tree.</p>
+<h2 id="23-counted-b-tree">2.3 - Counted B-tree</h2>
+<p>Another slightly different implementation of a <strong>B+tree</strong>,
where the number of elements stored in the descendants is stored withing each key. This allows
an easy count of elements on the left and right to any element, at the price of a additional
piece of information being added on each page.</p>
+<h2 id="24-mvcc-b-tree">2.4 - MVCC B-tree</h2>
+<p>This is a new 'style' of <strong>B+tree</strong>, where the structure
is exactly the same than a simple <strong>B+tree</strong>, except that we keep
old versions alive, until nobody use them. The idea is that a new revision of the <strong>B+tree</strong>
is added only when the updates are fully done. This has the extremely intersting characteristic
that there is no need of locks for read and writes, assuming that writes are not concurrent
(they are serialized).</p>
+<p>In other words, you may have many reads at the same time, still being able to update
the data.</p>
+<p>It comes to a price though : a lot of pages are copied for every updates, as we
create a new copy of every modified pages, up to the root page. </p>
+<p>We also have to manage old pages that can be removed as they belong to unused versions,
which requires an extra work.</p>
+<p><strong>Mavibot</strong> is a Java based implementation of a <strong>MVCC
B+tree</strong>.</p>
 
 
     <div class="nav">



Mime
View raw message