labs-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From conflue...@apache.org
Subject [CONF] Apache Labs > Physical pages
Date Tue, 11 Jun 2013 16:58:00 GMT
<html>
<head>
    <base href="https://cwiki.apache.org/confluence">
            <link rel="stylesheet" href="/confluence/s/2042/9/1/_/styles/combined.css?spaceKey=labs&amp;forWysiwyg=true"
type="text/css">
    </head>
<body style="background: white;" bgcolor="white" class="email-body">
<div id="pageContent">
<div id="notificationFormat">
<div class="wiki-content">
<div class="email">
    <h2><a href="https://cwiki.apache.org/confluence/display/labs/Physical+pages">Physical
pages</a></h2>
    <h4>Page <b>edited</b> by             <a href="https://cwiki.apache.org/confluence/display/~elecharny">Emmanuel
L├ęcharny</a>
    </h4>
        <br/>
                         <h4>Changes (1)</h4>
                                 
    
<div id="page-diffs">
                    <table class="diff" cellpadding="0" cellspacing="0">
    
            <tr><td class="diff-snipped" >...<br></td></tr>
            <tr><td class="diff-unchanged" > <br>!PageIOLogical.png! <br></td></tr>
            <tr><td class="diff-added-lines" style="background-color: #dfd;">
<br>Every *PageIO* are contiguous on disk. <br> <br>h1. RecordManager header
<br> <br>We keep a few bytes at the beginning of the file to store some critical
information about the *RecordManager* Here is the list of stored informations : <br>
<br>* The *PageIO* size (in bytes) <br>* The number of managed BTrees <br>*
The offset of the first free page <br>* The offset of the last free page <br>
<br>Here is a picture that shows the header content : <br> <br>!RMHeader.png!
<br> <br>We keep a track of the free pages (a free page is a *PageIO* that is
not anymore used, for instance because the data have been deleted.) This is done by keeping
a link between each *PageIO* and by pointing to the first feee *PageIO* and to the last free
*PageIO* <br> <br>At startup, of course, we have no free pages, and those pointers
contain the *-1* offset. <br> <br>h1. The internal structure <br> <br>The
*RecordManager* manages *BTrees*, and we have to store them into *PageIOs*. How do we do that
? <br> <br>All the *BTrees* have a header that contains many informations about
them, and point to a *rootPage* which is the current root (so the root for the latest revision).
As a *RecordManager* can manage more than one BTree, we have to find a way to retreive all
the *BTrees* at startup : we use an internal link, so that a btree points to the next btree.
At startup, we read the first BTree which is stored in the first *PageIO* in the file (so
just after the *RecordManager* header), then we read the next btree pointed by the first btree,
and so on. <br> <br>h2. The BTree header <br> <br>h2. The rootPage
<br> <br>The *rootPage* is a serialized *BTree* Page, it maybe stored in one or
more *PageIOs*. <br></td></tr>
    
            </table>
    </div>                            <h4>Full Content</h4>
                    <div class="notificationGreySide">
        <p>The <b>RecordManager</b> stores all the Btrees in one single
file, which is split in many physical pages, all having the same size. </p>

<div class='panelMacro'><table class='infoMacro'><colgroup><col width='24'><col></colgroup><tr><td
valign='top'><img src="/confluence/images/icons/emoticons/information.gif" width="16"
height="16" align="absmiddle" alt="" border="0"></td><td><b>The page
size</b><br />Currently, the choice was to use one single size for all the pages,
regardless the data we store into them. The rationnal is to get close to the OS page size
(frequently 512 bytes or 4096 bytes). This is not necessarily the best choice though, let's
say it's something we might want to change later.</td></tr></table></div>

<p>If we except a small header at the beginning of the file, everything is stored in
<b>PageIO</b> in the file.</p>

<h1><a name="Physicalpages-PageIO"></a>PageIO</h1>

<p>a <b>PageIO</b> contains some raw data. As we have to map some logical
data that may be wider than a physical fixed size <b>PageIO</b>, we link the <b>PageIO</b>
so that they contain all the data contained in a logical page.</p>

<p>Each <b>PageIO</b> has a hight byte pointer at the beginning, plus an
extra 4 bytes on the first <b>PageIO</b> to define the number of bytes stored.
Here is the mapping between a logical page and some <b>PageIO</b> :</p>

<p><span class="image-wrap" style=""><img src="/confluence/download/attachments/31824480/PageIOLogical.png?version=1&amp;modificationDate=1370968349834"
style="border: 0px solid black" /></span></p>

<p>Every <b>PageIO</b> are contiguous on disk.</p>

<h1><a name="Physicalpages-RecordManagerheader"></a>RecordManager header</h1>

<p>We keep a few bytes at the beginning of the file to store some critical information
about the <b>RecordManager</b> Here is the list of stored informations :</p>

<ul>
	<li>The <b>PageIO</b> size (in bytes)</li>
	<li>The number of managed BTrees</li>
	<li>The offset of the first free page</li>
	<li>The offset of the last free page</li>
</ul>


<p>Here is a picture that shows the header content :</p>

<p><span class="error">Unable to render embedded object: File (RMHeader.png) not
found.</span></p>

<p>We keep a track of the free pages (a free page is a <b>PageIO</b> that
is not anymore used, for instance because the data have been deleted.) This is done by keeping
a link between each <b>PageIO</b> and by pointing to the first feee <b>PageIO</b>
and to the last free <b>PageIO</b></p>

<p>At startup, of course, we have no free pages, and those pointers contain the <b>-1</b>
offset.</p>

<h1><a name="Physicalpages-Theinternalstructure"></a>The internal structure</h1>

<p>The <b>RecordManager</b> manages <b>BTrees</b>, and we have
to store them into <b>PageIOs</b>. How do we do that ?</p>

<p>All the <b>BTrees</b> have a header that contains many informations about
them, and point to a <b>rootPage</b> which is the current root (so the root for
the latest revision). As a <b>RecordManager</b> can manage more than one BTree,
we have to find a way to retreive all the <b>BTrees</b> at startup : we use an
internal link, so that a btree points to the next btree. At startup, we read the first BTree
which is stored in the first <b>PageIO</b> in the file (so just after the <b>RecordManager</b>
header), then we read the next btree pointed by the first btree, and so on.</p>

<h2><a name="Physicalpages-TheBTreeheader"></a>The BTree header</h2>

<h2><a name="Physicalpages-TherootPage"></a>The rootPage</h2>

<p>The <b>rootPage</b> is a serialized <b>BTree</b> Page, it
maybe stored in one or more <b>PageIOs</b>.</p>
    </div>
        <div id="commentsSection" class="wiki-content pageSection">
        <div style="float: right;">
            <a href="https://cwiki.apache.org/confluence/users/viewnotifications.action"
class="grey">Change Notification Preferences</a>
        </div>
        <a href="https://cwiki.apache.org/confluence/display/labs/Physical+pages">View
Online</a>
        |
        <a href="https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=31824480&revisedVersion=3&originalVersion=2">View
Changes</a>
                |
        <a href="https://cwiki.apache.org/confluence/display/labs/Physical+pages?showComments=true&amp;showCommentArea=true#addcomment">Add
Comment</a>
            </div>
</div>
</div>
</div>
</div>
</body>
</html>

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@labs.apache.org
For additional commands, e-mail: commits-help@labs.apache.org


Mime
View raw message