directory-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From elecha...@apache.org
Subject svn commit: r1553042 - in /directory/site/trunk/content/mavibot/user-guide: 1.1-btree-basics.mdtext 2-btree-types.mdtext
Date Sun, 22 Dec 2013 22:29:34 GMT
Author: elecharny
Date: Sun Dec 22 22:29:33 2013
New Revision: 1553042

URL: http://svn.apache.org/r1553042
Log:
added content

Modified:
    directory/site/trunk/content/mavibot/user-guide/1.1-btree-basics.mdtext
    directory/site/trunk/content/mavibot/user-guide/2-btree-types.mdtext

Modified: directory/site/trunk/content/mavibot/user-guide/1.1-btree-basics.mdtext
URL: http://svn.apache.org/viewvc/directory/site/trunk/content/mavibot/user-guide/1.1-btree-basics.mdtext?rev=1553042&r1=1553041&r2=1553042&view=diff
==============================================================================
--- directory/site/trunk/content/mavibot/user-guide/1.1-btree-basics.mdtext (original)
+++ directory/site/trunk/content/mavibot/user-guide/1.1-btree-basics.mdtext Sun Dec 22 22:29:33
2013
@@ -22,15 +22,17 @@ Notice: Licensed to the Apache Software 
 
 # 1.1 - B-tree Basics
 
-A **B-tree** "tree data structure that keeps data sorted and allows searches, sequential
access, insertions, and deletions in logarithmic time." (see [Wikipedia](http://en.wikipedia.org/wiki/B-tree))
+**B-tree** was invented back in 1971 by **Bayer** and **McCreight**  (the **B** does not
main anything know, speculations are that it comes either form the **B** of **Bayer** or from
**Boing** they were working for back then). It was an extension to binary search tree, which
was just able to store 2 elements per page.
 
-The important point here is the last one : it guarantees **O(logn)** operations, compared
to any other data structures (a hashed data structure offers **O(n)** average operations,
but can degenerate to **O(n2)**, and ordering is not kept. 
+A **B-tree** is a "tree data structure that keeps data sorted and allows searches, sequential
access, insertions, and deletions in logarithmic time." (see [Wikipedia](http://en.wikipedia.org/wiki/B-tree))
+
+The important point here is the last one : it guarantees **O(logn)** operations, compared
to any other data structures (a hashed data structure offers **O(n)** average operations,
but can degenerate to **O(n2)**, and ordering is not kept. But there is more : by gathering
N elements in a page, which leads to  more comparisons than necessary, but still save a lot
of disk accesses, as those comparison will be done in memory when a page is fully loaded.
That's the reason for **B-trees** to have been invented. 
 
 **B-trees** are everywhere : databases, OS, etc. It's a critical data structure when you
are to deal with a huge number of data.
 
 1.1.1 - Inside a B-tree
 
-A **B-Tree** contains **Nodes** and **Leaves**. A *Node* points to other **Nodes** or **Leaves**.
**Leaves** contains **Values**. Both **Nodes** and **Leaves** have **Keys** that are associated
with *Values*.
+A **B-Tree** contains **Nodes** and **Leaves**. A *Node* points to other **Nodes** or **Leaves**.
**Leaves** contains **Values**. Both **Nodes** and **Leaves** have **Keys** that are associated
with *Values*. **Keys** are not copied in many pages, they just appear once in the whole **B-tree**.
 
 Pretty simple !
 
@@ -40,3 +42,11 @@ A few more rules are enforced :
 * A **Node** and a **Leaf** contains up to N values (N being generally a power of 2, so 2,
4, 8, 16...).
 * You can't have less than N/2 **Values** or **keys** in a **Leaf** or a **Node**, except
for the root **Node**.
 
+
+
+1.1.2 - Concurrent access
+
+The real problem with **B-trees** is that we need to use some locks to allow concurrent access
to the data, especially when some updates occur. The rational is that when browsing a **B-tree**,
changing it will just break the browse operation. There are some technics to avoid having
such an issue, up to a point no read locks are needed, assuming some extra information is
added to the pages : a pointer to the next page at the same level. Sadly, it comes with drawbacks,
like it's difficult to remove empty pages from the **B-tree**.
+
+
+

Modified: directory/site/trunk/content/mavibot/user-guide/2-btree-types.mdtext
URL: http://svn.apache.org/viewvc/directory/site/trunk/content/mavibot/user-guide/2-btree-types.mdtext?rev=1553042&r1=1553041&r2=1553042&view=diff
==============================================================================
--- directory/site/trunk/content/mavibot/user-guide/2-btree-types.mdtext (original)
+++ directory/site/trunk/content/mavibot/user-guide/2-btree-types.mdtext Sun Dec 22 22:29:33
2013
@@ -26,6 +26,32 @@ Notice: Licensed to the Apache Software 
 
 You have many different flavors of **B-trees** :
 
-* B+tree : we have a pointer to the next **Node** in each **Node**
-* B*-tree : the internal **Nodes** are compacted, to contain at least 2/3 of the number of
possible slots
-* Counted B*-tree : the **B-tree**
\ No newline at end of file
+* B+tree
+* B*-tree
+* Counted B-tree
+* MVCC B-tree
+
+
+## 2.1 - B+tree
+
+This is a **B-tree** which does not store values in the **Nodes**, and a link between **leaves**
is added, to fasten the browsing : no need to go up to the parent's node to get the next value
when reaching the end of a leaf. Also the **nodes** don't contain value anymore.
+
+## 2.2 - B*tree
+
+A slightly different form of **B+tree**, where the number of element per page is enforced
to be at leat 2/3rd of the maximum numbers of elements the page can hold. It fasten a bit
the retrieval of elements, as we have a denser tree.
+
+## 2.3 - Counted B-tree
+
+Another slightly different implementation of a **B+tree**, where the number of elements stored
in the descendants is stored withing each key. This allows an easy count of elements on the
left and right to any element, at the price of a additional piece of information being added
on each page.
+
+## 2.4 - MVCC B-tree
+
+This is a new 'style' of **B+tree**, where the structure is exactly the same than a simple
**B+tree**, except that we keep old versions alive, until nobody use them. The idea is that
a new revision of the **B+tree** is added only when the updates are fully done. This has the
extremely intersting characteristic that there is no need of locks for read and writes, assuming
that writes are not concurrent (they are serialized).
+
+In other words, you may have many reads at the same time, still being able to update the
data.
+
+It comes to a price though : a lot of pages are copied for every updates, as we create a
new copy of every modified pages, up to the root page. 
+
+We also have to manage old pages that can be removed as they belong to unused versions, which
requires an extra work.
+
+**Mavibot** is a Java based implementation of a **MVCC B+tree**.
\ No newline at end of file



Mime
View raw message