labs-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From conflue...@apache.org
Subject [CONF] Apache Labs > Physical pages
Date Tue, 11 Jun 2013 17:37:00 GMT
<html>
<head>
    <base href="https://cwiki.apache.org/confluence">
            <link rel="stylesheet" href="/confluence/s/2042/9/1/_/styles/combined.css?spaceKey=labs&amp;forWysiwyg=true"
type="text/css">
    </head>
<body style="background: white;" bgcolor="white" class="email-body">
<div id="pageContent">
<div id="notificationFormat">
<div class="wiki-content">
<div class="email">
    <h2><a href="https://cwiki.apache.org/confluence/display/labs/Physical+pages">Physical
pages</a></h2>
    <h4>Page <b>edited</b> by             <a href="https://cwiki.apache.org/confluence/display/~elecharny">Emmanuel
L├ęcharny</a>
    </h4>
        <br/>
                         <h4>Changes (6)</h4>
                                 
    
<div id="page-diffs">
                    <table class="diff" cellpadding="0" cellspacing="0">
    
            <tr><td class="diff-snipped" >...<br></td></tr>
            <tr><td class="diff-unchanged" >On disk, each *Node* and *Leaf* are
stored in *PageIOs*, as we said. A *Node* will have pointers to some other logical pages,
and on disk, those pointers will be offset of the first *PageIO* used to store the logical
page it points to. <br> <br></td></tr>
            <tr><td class="diff-changed-lines" >Here is the *Node* <span class="diff-added-words"style="background-color:
#dfd;">and *Leaf*</span> data <span class="diff-changed-words">structure<span
class="diff-added-chars"style="background-color: #dfd;">s</span></span> once
serialized : <br></td></tr>
            <tr><td class="diff-unchanged" > <br></td></tr>
            <tr><td class="diff-changed-lines" ><span class="diff-changed-words">!node<span
class="diff-added-chars"style="background-color: #dfd;">Leaf</span>.png!</span>
<br></td></tr>
            <tr><td class="diff-unchanged" > <br></td></tr>
            <tr><td class="diff-deleted-lines" style="color:#999;background-color:#fdd;text-decoration:line-through;">Here
is the *Leaf* data structure once serialized : <br></td></tr>
            <tr><td class="diff-added-lines" style="background-color: #dfd;">Note
that we store the size of the serialized data : this is necessary as we have to know how much
*PageIOs* will be needed to store the logical page.  <br></td></tr>
            <tr><td class="diff-unchanged" > <br></td></tr>
            <tr><td class="diff-deleted-lines" style="color:#999;background-color:#fdd;text-decoration:line-through;">!leaf.png!
<br> <br></td></tr>
            <tr><td class="diff-unchanged" >The *rootPage* is just a *Node* or
a *Leaf*. <br></td></tr>
            <tr><td class="diff-added-lines" style="background-color: #dfd;">
<br>h1. Special BTrees <br> <br>We have two special *BTrees* we use to manage
some  <br></td></tr>
    
            </table>
    </div>                            <h4>Full Content</h4>
                    <div class="notificationGreySide">
        <p>The <b>RecordManager</b> stores all the Btrees in one single
file, which is split in many physical pages, all having the same size. </p>

<div class='panelMacro'><table class='infoMacro'><colgroup><col width='24'><col></colgroup><tr><td
valign='top'><img src="/confluence/images/icons/emoticons/information.gif" width="16"
height="16" align="absmiddle" alt="" border="0"></td><td><b>The page
size</b><br />Currently, the choice was to use one single size for all the pages,
regardless the data we store into them. The rationnal is to get close to the OS page size
(frequently 512 bytes or 4096 bytes). This is not necessarily the best choice though, let's
say it's something we might want to change later.</td></tr></table></div>

<p>If we except a small header at the beginning of the file, everything is stored in
<b>PageIO</b> in the file.</p>

<h1><a name="Physicalpages-PageIO"></a>PageIO</h1>

<p>a <b>PageIO</b> contains some raw data. As we have to map some logical
data that may be wider than a physical fixed size <b>PageIO</b>, we link the <b>PageIO</b>
so that they contain all the data contained in a logical page.</p>

<p>Each <b>PageIO</b> has a hight byte pointer at the beginning, plus an
extra 4 bytes on the first <b>PageIO</b> to define the number of bytes stored.
Here is the mapping between a logical page and some <b>PageIO</b> :</p>

<p><span class="image-wrap" style=""><img src="/confluence/download/attachments/31824480/PageIOLogical.png?version=1&amp;modificationDate=1370968349834"
style="border: 0px solid black" /></span></p>

<p>Every <b>PageIO</b> are contiguous on disk.</p>

<h1><a name="Physicalpages-RecordManagerheader"></a>RecordManager header</h1>

<p>We keep a few bytes at the beginning of the file to store some critical information
about the <b>RecordManager</b> Here is the list of stored informations :</p>

<ul>
	<li>The <b>PageIO</b> size (in bytes)</li>
	<li>The number of managed BTrees</li>
	<li>The offset of the first free page</li>
	<li>The offset of the last free page</li>
</ul>


<p>Here is a picture that shows the header content :</p>

<p><span class="image-wrap" style=""><img src="/confluence/download/attachments/31824480/RMHeader.png?version=1&amp;modificationDate=1370969865956"
style="border: 0px solid black" /></span></p>

<p>We keep a track of the free pages (a free page is a <b>PageIO</b> that
is not anymore used, for instance because the data have been deleted.) This is done by keeping
a link between each <b>PageIO</b> and by pointing to the first feee <b>PageIO</b>
and to the last free <b>PageIO</b></p>

<p>At startup, of course, we have no free pages, and those pointers contain the <b>-1</b>
offset.</p>

<h1><a name="Physicalpages-Theinternalstructure"></a>The internal structure</h1>

<p>The <b>RecordManager</b> manages <b>BTrees</b>, and we have
to store them into <b>PageIOs</b>. How do we do that ?</p>

<p>All the <b>BTrees</b> have a header that contains many informations about
them, and point to a <b>rootPage</b> which is the current root (so the root for
the latest revision). As a <b>RecordManager</b> can manage more than one BTree,
we have to find a way to retreive all the <b>BTrees</b> at startup : we use an
internal link, so that a btree points to the next btree. At startup, we read the first BTree
which is stored in the first <b>PageIO</b> in the file (so just after the <b>RecordManager</b>
header), then we read the next btree pointed by the first btree, and so on.</p>

<h2><a name="Physicalpages-TheBTreeheader"></a>The BTree header</h2>

<p>Each <b>BTree</b> has to keep many informations so that it can be used.
Those informations are :</p>

<ul>
	<li>revision : the current revision for this <b>BTree</b>. This value is
updated after each modification in the <b>BTree</b>.</li>
	<li>nbElems : the total number of elements we have in the <b>BTree</b>.
This is updated after each modification either.</li>
	<li>rootPage offset : the position in the file where the <b>rootPage</b>
is stored</li>
	<li>nextBtree offset : the position of the next <b>BTree</b> header in
the file (or -1 if we don't have any other <b>BTree</b>)</li>
	<li>pageSize : the number of elements we cans tore in a <b>Node</b> or
a <b>Leaf</b>. It's not related in any possible way with the <b>PageIO</b>
size.</li>
	<li>name : The <b>BTree</b> name</li>
	<li>keySerializer : The java FQCN for the key serializer</li>
	<li>valueSerializer : The java FQCN for the value serializer</li>
	<li>dupsAllowed : tells if the <b>BTree</b> can have duplicated values.</li>
</ul>


<p>Here is a diagram which present this data structure on disk :</p>

<p><span class="image-wrap" style=""><img src="/confluence/download/attachments/31824480/btreeHeader.png?version=1&amp;modificationDate=1370970520034"
style="border: 0px solid black" /></span></p>

<p>Note that a <b>BTree</b> header can be stored on one or many <b>IOPages</b>,
depending on its size.</p>


<h2><a name="Physicalpages-TheNodesandLeaves"></a>The Nodes and Leaves</h2>

<p><b>Nodes</b> and <b>Leaves</b> are logical <b>BTree</b>
pages which are serialized on disk into one to many <b>PageIOs</b>. They have
slightly different data structures, as <b>Nodes</b> contains pointers to leaves,
and no data, while <b>Leaves</b> contains data.</p>

<p>On disk, each <b>Node</b> and <b>Leaf</b> are stored in <b>PageIOs</b>,
as we said. A <b>Node</b> will have pointers to some other logical pages, and
on disk, those pointers will be offset of the first <b>PageIO</b> used to store
the logical page it points to.</p>

<p>Here is the <b>Node</b> and <b>Leaf</b> data structures once
serialized :</p>

<p><span class="image-wrap" style=""><img src="/confluence/download/attachments/31824480/nodeLeaf.png?version=1&amp;modificationDate=1370971026118"
style="border: 0px solid black" /></span></p>

<p>Note that we store the size of the serialized data : this is necessary as we have
to know how much <b>PageIOs</b> will be needed to store the logical page. </p>

<p>The <b>rootPage</b> is just a <b>Node</b> or a <b>Leaf</b>.</p>

<h1><a name="Physicalpages-SpecialBTrees"></a>Special BTrees</h1>

<p>We have two special <b>BTrees</b> we use to manage some </p>
    </div>
        <div id="commentsSection" class="wiki-content pageSection">
        <div style="float: right;">
            <a href="https://cwiki.apache.org/confluence/users/viewnotifications.action"
class="grey">Change Notification Preferences</a>
        </div>
        <a href="https://cwiki.apache.org/confluence/display/labs/Physical+pages">View
Online</a>
        |
        <a href="https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=31824480&revisedVersion=6&originalVersion=5">View
Changes</a>
                |
        <a href="https://cwiki.apache.org/confluence/display/labs/Physical+pages?showComments=true&amp;showCommentArea=true#addcomment">Add
Comment</a>
            </div>
</div>
</div>
</div>
</div>
</body>
</html>

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@labs.apache.org
For additional commands, e-mail: commits-help@labs.apache.org


Mime
View raw message