directory-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From build...@apache.org
Subject svn commit: r908301 - in /websites/staging/directory/trunk/content: ./ mavibot/user-guide/7.2-physical-storage.html
Date Wed, 07 May 2014 11:51:19 GMT
Author: buildbot
Date: Wed May  7 11:51:19 2014
New Revision: 908301

Log:
Staging update by buildbot for directory

Modified:
    websites/staging/directory/trunk/content/   (props changed)
    websites/staging/directory/trunk/content/mavibot/user-guide/7.2-physical-storage.html

Propchange: websites/staging/directory/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Wed May  7 11:51:19 2014
@@ -1 +1 @@
-1592675
+1592980

Modified: websites/staging/directory/trunk/content/mavibot/user-guide/7.2-physical-storage.html
==============================================================================
--- websites/staging/directory/trunk/content/mavibot/user-guide/7.2-physical-storage.html
(original)
+++ websites/staging/directory/trunk/content/mavibot/user-guide/7.2-physical-storage.html
Wed May  7 11:51:19 2014
@@ -155,50 +155,48 @@
 <p>When associated with a RecordManager, Mavibot stores all the Btrees in one single
file, which is split in many physical pages, all having the same size. </p>
 <blockquote>
 <p><strong>Note</strong>
-Currently, the choice was to use one single size for all the pages, regardless the data we
store into them. The rationnal is to
+Currently, the choice was to use same size for all the pages, regardless of the data stored
into them. The rationnal is to
 get close to the OS page size (frequently 512 bytes or 4096 bytes). This is not necessarily
the best choice though, let's say 
 it's something we might want to change later.</p>
 </blockquote>
 <h2 id="general-file-structure">General file structure</h2>
-<p>The file we use to store the data is a plain binary file, used to store all the
BTrees. We can store many btrees in one single file.</p>
-<p>This file is considered as a fileSystem, with fixed size 'pages' (a page is an array
of bytes). The page size is arbitrary fixed when the RecordManager is created, and we will
store every logical data n those physical pages, which will require to spread the logical
data in many pages in most of the cases.</p>
+<p>The data is stored in a pure binary file. All btrees are stored in this single file.</p>
+<p>This file is considered as a FileSystem, with fixed size 'pages' (a page is an array
of bytes). The page size is fixed when the RecordManager is created. The data represented
in a logical page may spread across many physical pages.</p>
 <h3 id="pageio">PageIO</h3>
-<p>Let's first introduce the <em>PageIO</em>, which is used to store the
data on disk.</p>
-<p>A <em>PageIO</em> contains some raw data. As we have to map some logical
data that may be wider than a physical fixed size <em>PageIO</em>, we use potentially
more than one <em>PageIO</em> to store the data, and we link the <em>PageIO</em>s
alltogether.</p>
-<p>Each <em>PageIO</em> has a height bytes pointer at the beginning, pointing
to the next PageIO (or to nothing, if there is no more <em>PageIO</em> in the
chain), plus an extra 4 bytes on the first <em>PageIO</em> to define the number
of bytes stored in the chain of PageIO. Here is the mapping between a logical page and some
PageIOs :</p>
+<p>A <em>PageIO</em> represents a physical block of the database file and
contains complete or a part of the logical page's data. PageIOs of a logical page are linked
together.</p>
+<p>Each <em>PageIO</em> has an eight byte pointer at the beginning, pointing
to the next PageIO (or to nothing, if it is the last <em>PageIO</em> in the chain),
plus an extra 4 bytes on the first <em>PageIO</em> to define the number of bytes
stored in the chain of PageIO. Here is the mapping between a logical page and some PageIOs
:</p>
 <p><img alt="PageIO mapping" src="images/PageIOLogical.png" /></p>
-<p>Every <em>PageIO</em>s are contiguous on disk, but the <em>PageIO</em>s
used to store a logical page may be located anywhere on the disk, they don't have to be continuous.</p>
+<p>All <em>PageIO</em>s are contiguous on disk, but the <em>PageIO</em>s
used to store a logical page may be located anywhere on the disk, they don't have to be continuous.</p>
 <p>Here is the structure of a <em>PageIO</em> on disk :</p>
 <ul>
-<li>next page offset (8 bytes) : the offset of the next <em>PageIO</em>,
or -1L if no more <em>PageIO</em> is needed</li>
-<li>data size (4 bytes) : for the first <em>PageIO</em>, the size of the
stored data across all the <em>PageIO</em>s used to store a page.</li>
-<li>data (N bytes) : a block of data, which size will be min( PageSize - offset - data
size, data size) for the first <em>PageIO</em> or min( PageSize - offset, data
size) for any other <em>PageIO</em>s</li>
+<li>next page offset (8 bytes) : the offset of the next <em>PageIO</em>,
or -1 if it is the last <em>PageIO</em></li>
+<li>data size (4 bytes) : the size of the data stored, this is only set in the first
<em>PageIO</em> of the chain of <em>PageIO</em>s used to store a logical
page.</li>
+<li>data (N bytes) : a block of data, whose size will be min( PageSize - offset - data
size, data size) for the first <em>PageIO</em> or min( PageSize - offset, data
size) for any other <em>PageIO</em>s</li>
 </ul>
 <h2 id="logical-structure-mapping-on-disk">Logical structure mapping on disk</h2>
 <p>We will now describe how each logical structure is serialized on disk.</p>
 <h3 id="recordmanager-header">RecordManager header</h3>
-<p>We keep a few bytes at the beginning of the file to store some critical information
about the RecordManager. Here is the list of stored informations :</p>
+<p>A few bytes at the beginning of the file are used to store some critical information
about the RecordManager. Here is the list of stored information details :</p>
 <ul>
 <li>The <em>PageIO</em> size (in bytes)</li>
 <li>The number of managed B-Trees</li>
 <li>The offset of the first free page</li>
 <li>The offset of the current B-tree of B-trees</li>
-<li>The offset of the previous B-tree of B-trees, if we an update operation is being
done</li>
+<li>The offset of the previous B-tree of B-trees, if an update operation was performed</li>
 <li>The offset of the current CopiedPages B-tree</li>
-<li>The offset of the previous CopiedPages B-tree, if we an update operation is being
done</li>
+<li>The offset of the previous CopiedPages B-tree, if an update operation was performed</li>
 </ul>
-<p>Here is a picture that shows the header content in two different case (when an update
operation is not completed, and when it's completed) :</p>
+<p>Here is a picture that shows the header content in two different cases (when an
update operation is not completed, and when it's completed) :</p>
 <p><img alt="RecordManager header" src="images/RMHeader.png" /></p>
-<p>We keep a track of the free pages (a free page is a PageIO that is not anymore used,
for instance because the data have been deleted.) This is done by keeping a link between each
PageIO and by pointing to the first feee PageIO and to the last free PageIO of this list.</p>
-<p>At startup, of course, we have no free pages, and those pointers contain the -1
offset.</p>
+<p>Recordmanager tracks the free pages (a free page is a PageIO that is not in use,
for instance when a key was deleted). This is done by linking each unused PageIO together
and finally updating the references of fisrt and last free pages. When there are no free pages
these pointers contain -1 as the offset value.</p>
 <p>This header is stored in a <em>PageIO</em>, at the very beginning of
the file.</p>
 <h3 id="the-recordmanager-structure">The RecordManager structure</h3>
-<p>The <em>RecordManager</em> manages <em>BTree</em>s, and
we have to store them into <em>PageIO</em>s. How do we do that ?</p>
-<p>All the <em>BTree</em>s have a header that contains many informations
about them, and point to a <em>rootPage</em> which is the current root (so the
root for the latest revision). As a <em>RecordManager</em> can manage more than
one <em>BTree</em>, we have to find a way to retreive all the <em>BTree</em>s
at startup : we use an internal link, so that a <em>BTree</em> points to the next
btree. At startup, we read the first <em>BTree</em> which is stored in the second
<em>PageIO</em> in the file (so just after the RecordManager header), then we
read the next <em>BTree</em> pointed by the first <em>BTree</em>,
and so on.</p>
+<p>Each <em>BTree</em> has a header that contains many details about it,
and points to a <em>rootPage</em> which is the current root (i.e., the root of
the latest revision). </p>
+<p>The special BTree called 'btree of btrees'(a.k.a BoB) contains the names and revisions
of each managed BTree. To load the managed btrees during startup, this BoB is read to find
the latest revision of each BTree and the associated root page's offset based on which the
btree will be loaded.</p>
 <h4 id="the-b-tree-info">The B-tree info</h4>
-<p>Each <em>B-tree</em> has some informations that never change over the
time. Here is the list of those informations :</p>
+<p>Each <em>B-tree</em> has some metadata that will never change after
creation. Here is the list of the details stored :</p>
 <ul>
-<li>pageSize (4 bytes) : the number of elements we cans store in a <em>Node</em>
or a <em>Leaf</em>. It's not related in any possible way with the <em>PageIO</em>
size.</li>
+<li>pageSize (4 bytes) : the number of elements we cans store in a <em>Node</em>
or a <em>Leaf</em>. It's not related in any way with the <em>PageIO</em>
size.</li>
 <li>nameSize (4 bytes) : The <em>BTree</em> name size</li>
 <li>name (nameSize bytes) : the <em>BTree</em> name</li>
 <li>keySerializerSize (4 bytes) : The size of the java <em>FQCN</em> for
the key serializer</li>
@@ -207,30 +205,30 @@ it's something we might want to change l
 <li>valueSerializer (valueSerializerSize bytes): The java <em>FQCN</em>
for the value serializer</li>
 <li>dupsAllowed (1 byte): tells if the <em>BTree</em> can have duplicated
values.</li>
 </ul>
-<p>As we can see, this header can have various length, and if one one the names is
long, we may need more than one PageIOs to store it.</p>
-<p>Note that a <em>B-tree</em> info can be stored on one or many <em>IOPage</em>s,
depending on its size.</p>
+<p>Note that a <em>B-tree</em> info can be stored in one or many <em>IOPage</em>s,
depending on its size.</p>
 <h4 id="the-btree-header">The BTree header</h4>
-<p>We may have many versions of the <em>B-tree</em> header, one per revision
(unless we overwrite it when we don't need to keep many revisions on disk).</p>
-<p>Each <em>BTree</em> has to keep many informations, which are :</p>
+<p>There will be one <em>B-tree</em> header per revision.</p>
+<p>Each <em>BTree</em> has to maintain many details, such as :</p>
 <ul>
-<li>revision (8 bytes) : the current revision for this <em>BTree</em>.
This value is updated after each modification in the <em>BTree</em>.</li>
-<li>nbElems (8 bytes) : the total number of elements we have in the <em>BTree</em>.
This is updated after each modification either.</li>
+<li>revision (8 bytes) : the current revision of the <em>BTree</em>. This
value is updated after each modification in the <em>BTree</em>.</li>
+<li>nbElems (8 bytes) : the total number of elements we have in the <em>BTree</em>.
This is updated after each modification.</li>
 <li>rootPage offset (8 bytes) : the position in the file where the <em>rootPage</em>
is stored</li>
 <li>btreeInfo offset (8 bytes) : the position of the <em>B-tree</em> info
page in the file</li>
 </ul>
-<p>Here is a diagram which present the <em>B-tree header</em> and the <em>B-tree
info</em> data structures on disk :</p>
+<p>Here is a diagram which represents the <em>B-tree header</em> and the
<em>B-tree info</em> data structures on disk :</p>
 <p><img alt="BTreeHeader" src="images/btreeHeader.png" /></p>
-<p>Note that they can be stored on one or many <em>IOPage</em>s, depending
on its size.</p>
-<p>All in all, when we have more than one <em>BTree</em> stored in the
file, the content of the file which stores the <em>BTree</em> headers will look
like this one :</p>
+<p>Note that they can be stored on one or more <em>IOPage</em>s, depending
on the size.</p>
+<p>The below picture shows the contents of the btree headers present in the database
file when more than one btree is stored.</p>
 <p><img alt="BTrees" src="images/BTree.png" /></p>
-<p>Note that each <em>B-tree Header</em> has one root page, even if it
contains no data. In this schema, we show the root page just after the <em>BTree</em>
it is associated to, but after a few updates, the root page may perfectly well be stored elsewhere
on the disk.</p>
+<p>Note that each <em>B-tree Header</em> has one root page, even if it
contains no data. In this picture, the root page is shown just after the <em>BTree</em>
it is associated with, but after a few updates, the root page may be stored elsewhere on the
disk.</p>
 <h4 id="the-nodes-and-leaves">The Nodes and Leaves</h4>
-<p>Nodes and Leaves are logical <em>BTree</em> pages which are serialized
on disk into one to many <em>PageIO</em>s. They have slightly different data structures,
as <em>Node</em>s contains pointers to <em>Leaves</em>, and no data,
while <em>Leaves</em> contains data. In any case, both contain the keys. The <em>Node</em>
has one ore value than the <em>Leaf</em>, too.</p>
-<p>On disk, each <em>Node</em> and <em>Leaf</em> are stored
in <em>PageIO</em>s, as we said. A <em>Node</em> will have pointers
to some other logical pages, and on disk, those pointers will be offset of the first <em>PageIO</em>
used to store the logical page it points to.</p>
-<p>Here is the <em>Node</em> and <em>Leaf</em> data structures
once serialized :</p>
+<p>Nodes and Leaves are logical <em>BTree</em> pages which are stored in
one or more <em>PageIO</em>s. They have slightly different data structures, as
<em>Node</em>s contain pointers to <em>Leaves</em>, and no data, while
<em>Leaves</em> contain data, but both contain the keys. The number of keys present
in a <em>Node</em> is higher
+than that of a <em>Leaf</em> by one.</p>
+<p>On disk, a <em>Node</em> will have pointers to some other logical pages,
those pointers will be offset of the first <em>PageIO</em> used to store the logical
page it points to.</p>
+<p>The below picture shows a <em>Node</em> and <em>Leaf</em>
after serialized to disk :</p>
 <p><img alt="Node and Leaf" src="images/nodeLeaf.png" /></p>
-<p>Note that we store the size of the serialized data : this is necessary as we have
to know how many <em>PageIO</em>s will be needed to store the logical page.</p>
-<p>The <em>rootPage</em> is just a <em>Node</em> or a <em>Leaf</em>.</p>
+<p>Note that this is necessary to store the size of the serialized data in order to
know how many <em>PageIO</em>s will be needed to store the logical page.</p>
+<p>The <em>rootPage</em> can be either a <em>Node</em> or a
<em>Leaf</em>.</p>
 <h4 id="potential-improvement">Potential improvement</h4>
 <p>We can get better performance by serializing the data differently. Instead of storing
keys and values as byte arrays prefixed by their length, we could store an array of keys and
values' offsets before the associated byte[]. Here is the resulting data structure, once serialized
:</p>
 <p><img alt="Node and Leaf, improved" src="images/nodeLeaf2.png" /></p>



Mime
View raw message