directory-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From build...@apache.org
Subject svn commit: r908362 - in /websites/staging/directory/trunk/content: ./ mavibot/user-guide/7.3-serializations.html
Date Thu, 08 May 2014 10:06:30 GMT
Author: buildbot
Date: Thu May  8 10:06:30 2014
New Revision: 908362

Log:
Staging update by buildbot for directory

Modified:
    websites/staging/directory/trunk/content/   (props changed)
    websites/staging/directory/trunk/content/mavibot/user-guide/7.3-serializations.html

Propchange: websites/staging/directory/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Thu May  8 10:06:30 2014
@@ -1 +1 @@
-1592980
+1593222

Modified: websites/staging/directory/trunk/content/mavibot/user-guide/7.3-serializations.html
==============================================================================
--- websites/staging/directory/trunk/content/mavibot/user-guide/7.3-serializations.html (original)
+++ websites/staging/directory/trunk/content/mavibot/user-guide/7.3-serializations.html Thu
May  8 10:06:30 2014
@@ -152,9 +152,9 @@
 
 
 <h1 id="73-serializations">7.3 - Serializations</h1>
-<p>The logical pages are serialized before being stored on physical pages. This is
a process that should obviously be reversible. W will describe in this chapter how we serialize
<strong>Leaf</strong> and <strong>Node</strong>.</p>
+<p>The logical pages are serialized before storing in physical pages. This is a process
that should obviously be reversible. In this chapter, we will describe how to serialize a
<strong>Leaf</strong> and <strong>Node</strong>.</p>
 <h2 id="leaf-serialization">Leaf serialization</h2>
-<p>A <strong>Leaf</strong> contains some medata data, and a set of keys
and values. A key can have many values, and we have as many keys as values. That being said,
let's describe the internal format of a serialized leaf :</p>
+<p>A <strong>Leaf</strong> contains some metadata, and a set of keys and
values. A key can have many values, and we have as many keys as values. The internal format
of a serialized leaf contains:</p>
 <pre>
     +----------------------------------+
     | revision (long)                  | 8 bytes
@@ -186,56 +186,23 @@
 
 </pre>
 
-<p>We keep the length of each serialized key and value just because we need to provide
a complete byte[] to the key or value deserializer (ie, the exact number of bytes needed to
be able to deserialize the key or the value).</p>
-<p>The <em>dataSize</em> value is used so that we can know how many bytes
we will have to read - thus the number of physical pages to read - in order to get a full
page.</p>
+<p>The length of each serialized key and value is stored so that a complete byte[]
can be passed to the key or value deserializer (ie, the exact number of bytes needed to be
able to deserialize the key or the value).</p>
+<p>The <em>dataSize</em> value is used to know how many bytes needs to
be read - thus the number of physical pages to read - in order to get a full page.</p>
 <h2 id="leaf-deserialization">Leaf deserialization</h2>
-<p>On the other way, when we deserialize a leaf, we won't deserialize the keys and
the values (it would be too expensive, if the leaf is discarded from memory immediately, when
we only need to read one single key and value). We thus keep the byte[] form of each keys
and each values, which will be deserialized on demand.</p>
-<p>We use two data structures to store a key and a value :
+<p>When a leaf needs to be deserialized, not all the keys and the values are deserialized
at once. It would be too expensive, if the leaf is discarded from memory immediately, when
we only need to read one single key and value, hence the keys and values are kept in byte[]
form, and will be deserialized on demand.</p>
+<p>Two data structures are used to store a key and a value :
 <em> a <em>KeyHolder</em> for the key</em> a <em>ValueHolder</em>
for the value</p>
 <h3 id="keyholder">KeyHolder</h3>
 <p>The <em>KeyHolder</em> data structure holds the key in two ways :
-<em> serialized (raw)
+<em> serialized (raw (byte[]))
 </em> deserialized (key)</p>
-<p>When we just read the data from disk, we don't deserialize the keys. We do that
when needed.</p>
-<p>Here is a description of this class :</p>
-<pre>
-public class KeyHolder<K>
-{
-    /** The deserialized key */
-    private K key;
-
-    /** The ByteBuffer storing the key */
-    private byte[] raw;
-
-    /** The Key serializer */
-    private ElementSerializer<K> keySerializer;
-}
-</pre>
-
-<h4 id="keyholder-operations">KeyHolder operations</h4>
-<p>Here is a description of the available methods in the KeyHolder class.</p>
-<h5 id="constructors">Constructors</h5>
-<p>We have two constructors for this class, one which takes a deserialized key, the
other which takes a byte[].</p>
-<ul>
-<li><em>KeyHolder( ElementSerializer<K> keySerializer, K key )</em></li>
-</ul>
-<p>Here, we need to serailize the key immediately, as we may have to flush the key
to the disk. We then serialize the Key immediately and store the resulting byte[] into the
<em>raw</em> field.</p>
-<ul>
-<li>KeyHolder( ElementSerializer<K> keySerializer, byte[] raw )</li>
-</ul>
-<p>Here, we just get the serialized form. We don't need to deserialize it, as the key
might not be used anytime soon. We thus just update the <em>raw</em> field, and
the <em>key</em> field remains null.</p>
-<h5 id="getkey">getKey()</h5>
-<p>This method retuns the deserialized key. If it does not exist, then we deserialize
it on the fly using the <em>raw</em> field.</p>
-<h5 id="setkey">setKey()</h5>
-<p>This method set the key. We immediately serialize it, and store the results in the
<em>raw</em> field.</p>
-<h5 id="getraw">getRaw()</h5>
-<p>Returns the <em>raw</em> field. This method is only visible from the
classes in the same package.</p>
+<p>The keys are not deserialized immediately after the data was read from disk instead
they are deserialized on demand.</p>
 <h3 id="valueholder">ValueHolder</h3>
-<p>The <em>ValueHolder</em> data structure will store the list of values
associated with a key. As we may have more than one value, we use an internal structure for
that purpose. This is a complex data structure, if we compare it with the <em>KeyHolder</em>
one.</p>
-<p>In some case, the number of values to store is really big, this we need to use an
internal data structure that allows a quick retrieval of a value, plus we need to be able
to copy a page containing such a value in an efficient way. For these reasons, we use two
different internal data structures :
-<em> an array up to a threshold
-</em> a sub-BTree above this threshold</p>
-<p>When we reach the threshold, the array is transformed into a BTree, and the way
back if we get below this number. In order to avoid many array &lt;-&gt; btree transformations
if we continusously add and delete a value, the array -&gt; btree threshold is bigger
than the btree -&gt; array threshold.</p>
+<p>Mavibot supports multiple values for a key. The <em>ValueHolder</em>
data structure will store a set of values associated with a key. This is a complex data structure
when compared with the <em>KeyHolder</em>.</p>
+<p>An array is used as the default container to hold these values. In some cases, the
number of values to be stored is really big, and using an array will impact the performance.
+In such cases the array is replaced by a BTree, this helps in improving the retrieval time,
and also page copying becomes more efficient.</p>
+<p>When the number of the values stored reaches a threshold, the array is replaced
with a BTree, likewise if the number of values stored are is below the threshold then BTree
is replaced by an array.
+In order to avoid many array &lt;-&gt; btree transformations (e.g. when continusously
adding and deleting a value), the array -&gt; btree threshold is bigger than the btree
-&gt; array threshold.</p>
 <pre>
    0---1---2---...---TH-low--...--TH-high---...
    >-------------Array----------->>---BTree---... When we add new values.
@@ -244,57 +211,29 @@ public class KeyHolder<K>
    <-----Array-----<<--------BTree------------... When we delete values.
 </pre>
 
-<p>It's important to know that the sub-BTree will hold only keys, and no values. The
sub-btree Keys will be the values we have to store.</p>
-<p>Here is the description of this class :</p>
-<pre>
-public class ValueHolder<V> implements Cloneable
-{
-    /** The deserialized value */
-    private V[] valueArray;
-
-    /** The BTree storing multiple value, if we have moe than a threashold values */
-    private BTree<V, V> valueBtree;
-
-    /** The serialized value */
-    private byte[] raw;
-
-    /** A flag set to true if the values are stored in a BTree */
-    private boolean isSubBtree = false;
-
-    /** The RecordManager */
-    private BTree<?, V> btree;
-
-    /** The Value serializer */
-    private ElementSerializer<V> valueSerializer;
-
-    /** An internal flag used when the values are not yet deserialized */
-    private boolean isRaw = true;
-}
-</pre>
-
-<p>As we can see, we use two different fields to store the data, either the <em>valueArray</em>
field or the <em>valueBtree</em> field. The <em>raw</em> field contains
the serialized values when the values are stored in an array; It's null either if the values
are stored in a BTree, or if we already have deserialized the values.</p>
+<p>It's important to note that the values inserted into the sub-BTree will be stored
as keys, and all the values of this sub-BTree will contain nulls. The sub-btree Keys will
be the values of the key present in the parent-BTree.</p>
 <h4 id="rawdeserialized-values">Raw/deserialized values</h4>
 <p>One key for obtaining good performances is to avoid any useless deserialization.
This is easy to implement for the <em>KeyHolder</em>, as we only store one single
key. For values, it's slightly more complex, as we may have more than one value. The following
rules should be followed :</p>
 <ul>
-<li>don't deserialized until necessary (ie, when one need to get one value)</li>
-<li>don't serialize when unnedded (ie until the values must be written back to disk)</li>
+<li>don't deserialized until necessary (ie, until one needs to get a value)</li>
+<li>don't serialize until writing (i.e, until the value(s) must be written to disk)</li>
 </ul>
-<p>In fact, we may be in three different states :</p>
+<p>In fact, a value may be in three different states :</p>
 <ul>
 <li>all the values are serialized (when we just read the page from disk)</li>
-<li>non of the values are serialized (when we just created a ValueHolder with new values)</li>
+<li>none of the values are serialized (when we just created a ValueHolder with new
values)</li>
 <li>somewhere in the middle, when we are modifying a ValueHolder which has been read
from the disk</li>
 </ul>
 <p>The third case is the complex one. We should consider two different cases though
:
-<em> the values are stored in a sub BTree : we don't have to deal with this problem,
it's up to the sub-btree to deal with it
-</em> the values are stored in an array : we don't want to store half of the values
as byte[], and half of the values as java instances. We must pick either one form or the other.
In this case, as soon as we have to manipulate values in Java, then we need to deserialize
all the values.</p>
+<em> the values are stored in a sub-BTree : we don't have to deal with this problem,
it's up to the sub-btree to deal with it
+</em> the values are stored in an array : we don't want to store half of the values
as byte[], and half of the values as java instances. We must pick either one form or the other.
In this case, as soon as we have to manipulate values in Java, we must deserialize all the
values.</p>
 <h4 id="valueholder-operations">ValueHolder operations</h4>
 <p>The possible operations on a ValueHolder are the following :</p>
 <ul>
-<li>add( value ) : Insert a new value into the ValueHolder. If we reach the upper threshold,
then the array is converted into a BTree. In any case, we inject the new value into the array
or the BTree so that we keep all the value ordered (the ValueSerializer must have a <em>Comparator</em>).</li>
+<li>add( value ) : Insert a new value into the ValueHolder. If we reach the upper threshold,
then the array is converted into a BTree. In any case, we inject the new value into the array(in
the correct order using the comparator present in the serializer) or the BTree.</li>
 </ul>
-<p>As we need to compare values, they must be deserialised, so we need to do it if
it's not already done (the values are not deserialiezed when the page is read from the disk).
Note that it's not necessary for the sub BTree, as it's up to the sub-btree to deserialize
the keys on the fly</p>
-<p>The <em>add</em> algorithm will thus be :</p>
+<p>As we need to compare values, they must be deserialised, so we need to do it if
it's not already done (the values are not deserialiezed when the page is read from the disk).
Note that it's not necessary for the sub-btree, as it's up to the sub-btree to deserialize
the keys on the fly.</p>
+<p>Thus the <em>add</em> algorithm will be :</p>
 <pre>
   if the values are not yet deserialized
     then deserialize all the values



Mime
View raw message