labs-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From conflue...@apache.org
Subject [CONF] Apache Labs > Physical pages
Date Wed, 12 Jun 2013 09:56:00 GMT
<html>
<head>
    <base href="https://cwiki.apache.org/confluence">
            <link rel="stylesheet" href="/confluence/s/2042/9/1/_/styles/combined.css?spaceKey=labs&amp;forWysiwyg=true"
type="text/css">
    </head>
<body style="background: white;" bgcolor="white" class="email-body">
<div id="pageContent">
<div id="notificationFormat">
<div class="wiki-content">
<div class="email">
    <h2><a href="https://cwiki.apache.org/confluence/display/labs/Physical+pages">Physical
pages</a></h2>
    <h4>Page <b>edited</b> by             <a href="https://cwiki.apache.org/confluence/display/~elecharny">Emmanuel
L├ęcharny</a>
    </h4>
        <br/>
                         <h4>Changes (2)</h4>
                                 
    
<div id="page-diffs">
                    <table class="diff" cellpadding="0" cellspacing="0">
    
            <tr><td class="diff-snipped" >...<br></td></tr>
            <tr><td class="diff-unchanged" >h1. Special BTrees <br> <br></td></tr>
            <tr><td class="diff-deleted-lines" style="color:#999;background-color:#fdd;text-decoration:line-through;">We
have two special *BTrees* we use to manage some  <br></td></tr>
            <tr><td class="diff-added-lines" style="background-color: #dfd;">We
have two special *BTrees* we use to manage the revisions and the copied pages. We will explain
what they are good for <br> <br>h2. Revision tree <br> <br>We need
to keep a track of each active revision, so that a search can work with a specific revision.
The idea is that when a search starts, it uses the latest revision, but as some modification
can occur while teh search is beng processed, some new revisions will be added. In some case,
we may want to keep a revision active for quite a long time. <br> <br>So we store
the active revisions in a dedicated *BTree*. <br> <br>As we may have many *BTrees*,
we have to use a key which is a combinaison of the *BTree* name and its revision. So the revision
*BTree* manage the revisions of all the managed *BTrees*. <br> <br>When a revision
is not anymore used, we can remove it from the revision *BTree*. <br> <br>This
*BTree* is not a MVCC *BTree*, like the other ones. In other words, we only keep the latest
revision of this *BTree* (ie, all the modified pages are immediately freed) <br> <br>h2.
Copied pages BTree <br> <br>Once we create a new revision, the pages we copied
are not anymore in use except if the revisions they are associated with are still in use.
The problem is that we can&#39;t discard those pages and move them to the free list until
the associated revision is free. <br> <br>We use a dedicated *BTree* to keep a
track of the copied pages, which will be reclaimed and moved to the free pages list once the
associated revision will be released. <br> <br>h1. Managing the free pages <br>
<br>We have a mechanism to manage the *PageIO* that are not anymore in use. This is
a linked list in which the free pages are added. If we need some page, we first look into
this list, and get back as many *PageIOs* as we need - until we reach the end of this list.
If we free some page, we add them at the end of the free list. <br> <br>We always
free a logical page, which may be stored into many *PageIOs*. The good thing is that those
*pageIOs* are already linked, so we just need to make the last free *PageIO* to point on the
first freed *PageIO*, and to move the pointer to the last free page to the last *PageIO* used
to store the logical page. <br></td></tr>
    
            </table>
    </div>                            <h4>Full Content</h4>
                    <div class="notificationGreySide">
        <p>The <b>RecordManager</b> stores all the Btrees in one single
file, which is split in many physical pages, all having the same size. </p>

<div class='panelMacro'><table class='infoMacro'><colgroup><col width='24'><col></colgroup><tr><td
valign='top'><img src="/confluence/images/icons/emoticons/information.gif" width="16"
height="16" align="absmiddle" alt="" border="0"></td><td><b>The page
size</b><br />Currently, the choice was to use one single size for all the pages,
regardless the data we store into them. The rationnal is to get close to the OS page size
(frequently 512 bytes or 4096 bytes). This is not necessarily the best choice though, let's
say it's something we might want to change later.</td></tr></table></div>

<p>If we except a small header at the beginning of the file, everything is stored in
<b>PageIO</b> in the file.</p>

<h1><a name="Physicalpages-PageIO"></a>PageIO</h1>

<p>a <b>PageIO</b> contains some raw data. As we have to map some logical
data that may be wider than a physical fixed size <b>PageIO</b>, we link the <b>PageIO</b>
so that they contain all the data contained in a logical page.</p>

<p>Each <b>PageIO</b> has a hight byte pointer at the beginning, plus an
extra 4 bytes on the first <b>PageIO</b> to define the number of bytes stored.
Here is the mapping between a logical page and some <b>PageIO</b> :</p>

<p><span class="image-wrap" style=""><img src="/confluence/download/attachments/31824480/PageIOLogical.png?version=1&amp;modificationDate=1370968349000"
style="border: 0px solid black" /></span></p>

<p>Every <b>PageIO</b> are contiguous on disk.</p>

<h1><a name="Physicalpages-RecordManagerheader"></a>RecordManager header</h1>

<p>We keep a few bytes at the beginning of the file to store some critical information
about the <b>RecordManager</b> Here is the list of stored informations :</p>

<ul>
	<li>The <b>PageIO</b> size (in bytes)</li>
	<li>The number of managed BTrees</li>
	<li>The offset of the first free page</li>
	<li>The offset of the last free page</li>
</ul>


<p>Here is a picture that shows the header content :</p>

<p><span class="image-wrap" style=""><img src="/confluence/download/attachments/31824480/RMHeader.png?version=1&amp;modificationDate=1370969865000"
style="border: 0px solid black" /></span></p>

<p>We keep a track of the free pages (a free page is a <b>PageIO</b> that
is not anymore used, for instance because the data have been deleted.) This is done by keeping
a link between each <b>PageIO</b> and by pointing to the first feee <b>PageIO</b>
and to the last free <b>PageIO</b></p>

<p>At startup, of course, we have no free pages, and those pointers contain the <b>-1</b>
offset.</p>

<h1><a name="Physicalpages-Theinternalstructure"></a>The internal structure</h1>

<p>The <b>RecordManager</b> manages <b>BTrees</b>, and we have
to store them into <b>PageIOs</b>. How do we do that ?</p>

<p>All the <b>BTrees</b> have a header that contains many informations about
them, and point to a <b>rootPage</b> which is the current root (so the root for
the latest revision). As a <b>RecordManager</b> can manage more than one BTree,
we have to find a way to retreive all the <b>BTrees</b> at startup : we use an
internal link, so that a btree points to the next btree. At startup, we read the first BTree
which is stored in the first <b>PageIO</b> in the file (so just after the <b>RecordManager</b>
header), then we read the next btree pointed by the first btree, and so on.</p>

<h2><a name="Physicalpages-TheBTreeheader"></a>The BTree header</h2>

<p>Each <b>BTree</b> has to keep many informations so that it can be used.
Those informations are :</p>

<ul>
	<li>revision : the current revision for this <b>BTree</b>. This value is
updated after each modification in the <b>BTree</b>.</li>
	<li>nbElems : the total number of elements we have in the <b>BTree</b>.
This is updated after each modification either.</li>
	<li>rootPage offset : the position in the file where the <b>rootPage</b>
is stored</li>
	<li>nextBtree offset : the position of the next <b>BTree</b> header in
the file (or -1 if we don't have any other <b>BTree</b>)</li>
	<li>pageSize : the number of elements we cans tore in a <b>Node</b> or
a <b>Leaf</b>. It's not related in any possible way with the <b>PageIO</b>
size.</li>
	<li>name : The <b>BTree</b> name</li>
	<li>keySerializer : The java FQCN for the key serializer</li>
	<li>valueSerializer : The java FQCN for the value serializer</li>
	<li>dupsAllowed : tells if the <b>BTree</b> can have duplicated values.</li>
</ul>


<p>Here is a diagram which present this data structure on disk :</p>

<p><span class="image-wrap" style=""><img src="/confluence/download/attachments/31824480/btreeHeader.png?version=1&amp;modificationDate=1370970520000"
style="border: 0px solid black" /></span></p>

<p>Note that a <b>BTree</b> header can be stored on one or many <b>IOPages</b>,
depending on its size.</p>


<h2><a name="Physicalpages-TheNodesandLeaves"></a>The Nodes and Leaves</h2>

<p><b>Nodes</b> and <b>Leaves</b> are logical <b>BTree</b>
pages which are serialized on disk into one to many <b>PageIOs</b>. They have
slightly different data structures, as <b>Nodes</b> contains pointers to leaves,
and no data, while <b>Leaves</b> contains data.</p>

<p>On disk, each <b>Node</b> and <b>Leaf</b> are stored in <b>PageIOs</b>,
as we said. A <b>Node</b> will have pointers to some other logical pages, and
on disk, those pointers will be offset of the first <b>PageIO</b> used to store
the logical page it points to.</p>

<p>Here is the <b>Node</b> and <b>Leaf</b> data structures once
serialized :</p>

<p><span class="image-wrap" style=""><img src="/confluence/download/attachments/31824480/nodeLeaf.png?version=1&amp;modificationDate=1370971026000"
style="border: 0px solid black" /></span></p>

<p>Note that we store the size of the serialized data : this is necessary as we have
to know how much <b>PageIOs</b> will be needed to store the logical page. </p>

<p>The <b>rootPage</b> is just a <b>Node</b> or a <b>Leaf</b>.</p>

<h1><a name="Physicalpages-SpecialBTrees"></a>Special BTrees</h1>

<p>We have two special <b>BTrees</b> we use to manage the revisions and
the copied pages. We will explain what they are good for</p>

<h2><a name="Physicalpages-Revisiontree"></a>Revision tree</h2>

<p>We need to keep a track of each active revision, so that a search can work with a
specific revision. The idea is that when a search starts, it uses the latest revision, but
as some modification can occur while teh search is beng processed, some new revisions will
be added. In some case, we may want to keep a revision active for quite a long time.</p>

<p>So we store the active revisions in a dedicated <b>BTree</b>.</p>

<p>As we may have many <b>BTrees</b>, we have to use a key which is a combinaison
of the <b>BTree</b> name and its revision. So the revision <b>BTree</b>
manage the revisions of all the managed <b>BTrees</b>.</p>

<p>When a revision is not anymore used, we can remove it from the revision <b>BTree</b>.</p>

<p>This <b>BTree</b> is not a MVCC <b>BTree</b>, like the other
ones. In other words, we only keep the latest revision of this <b>BTree</b> (ie,
all the modified pages are immediately freed)</p>

<h2><a name="Physicalpages-CopiedpagesBTree"></a>Copied pages BTree</h2>

<p>Once we create a new revision, the pages we copied are not anymore in use except
if the revisions they are associated with are still in use. The problem is that we can't discard
those pages and move them to the free list until the associated revision is free.</p>

<p>We use a dedicated <b>BTree</b> to keep a track of the copied pages,
which will be reclaimed and moved to the free pages list once the associated revision will
be released.</p>

<h1><a name="Physicalpages-Managingthefreepages"></a>Managing the free pages</h1>

<p>We have a mechanism to manage the <b>PageIO</b> that are not anymore
in use. This is a linked list in which the free pages are added. If we need some page, we
first look into this list, and get back as many <b>PageIOs</b> as we need - until
we reach the end of this list. If we free some page, we add them at the end of the free list.</p>

<p>We always free a logical page, which may be stored into many <b>PageIOs</b>.
The good thing is that those <b>pageIOs</b> are already linked, so we just need
to make the last free <b>PageIO</b> to point on the first freed <b>PageIO</b>,
and to move the pointer to the last free page to the last <b>PageIO</b> used to
store the logical page.</p>
    </div>
        <div id="commentsSection" class="wiki-content pageSection">
        <div style="float: right;">
            <a href="https://cwiki.apache.org/confluence/users/viewnotifications.action"
class="grey">Change Notification Preferences</a>
        </div>
        <a href="https://cwiki.apache.org/confluence/display/labs/Physical+pages">View
Online</a>
        |
        <a href="https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=31824480&revisedVersion=7&originalVersion=6">View
Changes</a>
                |
        <a href="https://cwiki.apache.org/confluence/display/labs/Physical+pages?showComments=true&amp;showCommentArea=true#addcomment">Add
Comment</a>
            </div>
</div>
</div>
</div>
</div>
</body>
</html>

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@labs.apache.org
For additional commands, e-mail: commits-help@labs.apache.org


Mime
View raw message