directory-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From elecha...@apache.org
Subject svn commit: r1525247 [1/3] - in /directory/site/trunk/content/mavibot/user-guide: 2.1-file-format.mdtext images/nodeLeaf.graphml images/nodeLeaf.png images/nodeLeaf2.graphml images/nodeLeaf2.png
Date Sat, 21 Sep 2013 14:54:58 GMT
Author: elecharny
Date: Sat Sep 21 14:54:57 2013
New Revision: 1525247

URL: http://svn.apache.org/r1525247
Log:
added some content

Added:
    directory/site/trunk/content/mavibot/user-guide/images/nodeLeaf2.graphml
    directory/site/trunk/content/mavibot/user-guide/images/nodeLeaf2.png   (with props)
Modified:
    directory/site/trunk/content/mavibot/user-guide/2.1-file-format.mdtext
    directory/site/trunk/content/mavibot/user-guide/images/nodeLeaf.graphml
    directory/site/trunk/content/mavibot/user-guide/images/nodeLeaf.png

Modified: directory/site/trunk/content/mavibot/user-guide/2.1-file-format.mdtext
URL: http://svn.apache.org/viewvc/directory/site/trunk/content/mavibot/user-guide/2.1-file-format.mdtext?rev=1525247&r1=1525246&r2=1525247&view=diff
==============================================================================
--- directory/site/trunk/content/mavibot/user-guide/2.1-file-format.mdtext (original)
+++ directory/site/trunk/content/mavibot/user-guide/2.1-file-format.mdtext Sat Sep 21 14:54:57
2013
@@ -119,7 +119,7 @@ Note that each *BTreeHeader* has at leas
 
 #### The Nodes and Leaves
 
-Nodes and Leaves are logical *BTree* pages which are serialized on disk into one to many
*PageIO*s. They have slightly different data structures, as *Node*s contains pointers to *Leaves*,
and no data, while *Leaves* contains data. In any case, both contain the keys.
+Nodes and Leaves are logical *BTree* pages which are serialized on disk into one to many
*PageIO*s. They have slightly different data structures, as *Node*s contains pointers to *Leaves*,
and no data, while *Leaves* contains data. In any case, both contain the keys. The *Node*
has one ore value than the *Leaf*, too.
 
 On disk, each *Node* and *Leaf* are stored in *PageIO*s, as we said. A *Node* will have pointers
to some other logical pages, and on disk, those pointers will be offset of the first *PageIO*
used to store the logical page it points to.
 
@@ -128,4 +128,20 @@ Here is the *Node* and *Leaf* data struc
 ![Node and Leaf](images/nodeLeaf.png)
 
 
+Note that we store the size of the serialized data : this is necessary as we have to know
how many *PageIO*s will be needed to store the logical page.
+
+The *rootPage* is just a *Node* or a *Leaf*.
+
+#### Potential improvement
+
+We can get better performance by serializing the data differently. Instead of storing keys
and values as byte arrays prefixed by their length, we could store an array of keys and values'
offsets before the associated byte[]. Here is the resulting data structure, once serialized
:
+
+![Node and Leaf, improved](images/nodeLeaf2.png)
+
+(The *Node* is not described, as it's basically the same data structure, but with one extra
value).
+
+It does not need more space to serialize the data this way, as the offsets are ints, and
in the previous version, those ints are used to store the length of the keys and values anyway.
+
+The gain is that we can have access to a given key and value without having to read all the
previous keys and values. Also we can now read a leaf or a node without having to deserialize
all the keys and values they contain.
+
 



Mime
View raw message