db-derby-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From j..@apache.org
Subject svn commit: r149475 - in incubator/derby/site/trunk: build/site/papers/container-format.png build/site/papers/pageformats.html src/documentation/content/xdocs/papers/container-format.aart src/documentation/content/xdocs/papers/pageformats.xml
Date Wed, 02 Feb 2005 03:29:25 GMT
Author: jta
Date: Tue Feb  1 19:29:24 2005
New Revision: 149475

URL: http://svn.apache.org/viewcvs?view=rev&rev=149475
Log:
Committed changes to papers/pageformats by Dibyendu Majumdar <dibyendu@mazumdar.demon.co.uk> .


Added:
    incubator/derby/site/trunk/build/site/papers/container-format.png   (with props)
    incubator/derby/site/trunk/src/documentation/content/xdocs/papers/container-format.aart   (with props)
Modified:
    incubator/derby/site/trunk/build/site/papers/pageformats.html
    incubator/derby/site/trunk/src/documentation/content/xdocs/papers/pageformats.xml

Added: incubator/derby/site/trunk/build/site/papers/container-format.png
URL: http://svn.apache.org/viewcvs/incubator/derby/site/trunk/build/site/papers/container-format.png?view=auto&rev=149475
==============================================================================
Binary file - no diff available.

Propchange: incubator/derby/site/trunk/build/site/papers/container-format.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Modified: incubator/derby/site/trunk/build/site/papers/pageformats.html
URL: http://svn.apache.org/viewcvs/incubator/derby/site/trunk/build/site/papers/pageformats.html?view=diff&r1=149474&r2=149475
==============================================================================
--- incubator/derby/site/trunk/build/site/papers/pageformats.html (original)
+++ incubator/derby/site/trunk/build/site/papers/pageformats.html Tue Feb  1 19:29:24 2005
@@ -188,10 +188,15 @@
 </div>
 <h1>Derby On Disk Page Format</h1>
 <div class="abstract">This document describes the storage format of Derby disk pages. 
+
     This is a work-in-progress derived from Javadoc comments and 
+
     from explanations Mike Matrigali posted to the Derby lists. 
+
     Please post questions, comments, and corrections to 
+
     derby-dev@db.apache.org.
+
     </div>
 <div id="minitoc-area">
 <ul class="minitoc">
@@ -220,517 +225,1092 @@
 </li>
 <li>
 <a href="#allocpage">Allocation Page</a>
+<ul class="minitoc">
+<li>
+<a href="#%0A%0A%09Alloc+Page+detailed+implementation+notes">
+
+	Alloc Page detailed implementation notes</a>
+</li>
+</ul>
+</li>
+<li>
+<a href="#Allocation+Extent">Allocation Extent</a>
 </li>
 </ul>
 </div>
-
-
     
 <a name="N1000F"></a><a name="introduction"></a>
 <h2 class="boxed"> Introduction </h2>
 <div class="section">
 <p>Derby stores table and index data in Containers, which currently map 
-        to files in the 
-        <span class="codefrag">seg0</span>
-        directory of the database. Data is stored in pages within the container.</p>
-<div class="frame fixme">
-<div class="label">Fixme (Dibyendu Majumdar)</div>
-<div class="content"> Do all containers map to a single file, or does each container map 
-        to a file? </div>
-</div>
+
+        to files in the <span class="codefrag">seg0</span>
+
+        directory of the database. In the current Derby implementation there is a a 1 to 1 mapping of
+
+        containers to files. Two containers never map to a single file and 1
+
+	  container never maps to multiple containers.</p>
+<p>
+
+       Data is stored in pages within the container.</p>
 <p>A page contains a set of records, which can be accessed by "slot", which 
+
         defines the order of the records on the page, or by "id" which defines 
+
         the identity of the records on the page. Clients access records by both 
+
         slot and id, depending on their needs.</p>
-<p>There are two types of pages - Raw Stored Pages which hold data, and 
-        Raw Stored Alloc Pages which hold page allocation information.</p>
 <p>A Table or a BTree index provides a row-based access mechanism (row-based 
+
         access interface is known as conglomerate). Rows are mapped to records 
-        in pages, in case of a table, a single row can span multiple records in 
+
+        in data pages; in case of a table, a single row can span multiple records in 
+
         multiple pages.</p>
+<p>A container can have three types of pages:</p>
+<ul>
+        
+<li>Header Page - which is just a specialized version of the Alloc Page.</li>
+        
+<li>Data Pages which hold data, and</li>
+        
+<li>Alloc Pages which hold page allocation information. An Alloc page is a specialized verion of the Data page.</li>
+      
+</ul>
+<p>The container can be visualised as:<br>
+<img alt="" src="container-format.png"></p>
+<p>
+
+Header Page is currently always page 0 of the container.  It
+
+contains information that raw store needs to maintain about the
+
+container once per container, and is currently implemented as an Alloc
+
+Page which "borrows" space from the alloc page for it's information.
+
+The original decision was that the designers did not want to waste a whole page for
+
+header information, so a part of the page was used and the first allocation
+
+map was put on the second half of it. See <span class="codefrag">AllocPage.java</span> for info about layout and
+
+borrowing.
+
+</p>
+<p>
+        
+<a href="#allocpage"> Allocation Page</a> - After page 0, all subsequent Allocation pages only
+
+have allocation bit maps.
+
+</p>
 </div>
     
-<a name="N10029"></a><a name="storedpage"></a>
+<a name="N10048"></a><a name="storedpage"></a>
 <h2 class="boxed">Data Page Format</h2>
 <div class="section">
 <p>A data page is broken into five sections. 
-        <img alt="" src="page-format.png">
-      </p>
-<a name="N10036"></a><a name="formatid"></a>
+
+        <img alt="" src="page-format.png"></p>
+<a name="N10054"></a><a name="formatid"></a>
 <h3 class="boxed">Format Id </h3>
 <p> The formatId is a 4 bytes array, it contains the format Id of this 
+
           page. The possible values are RAW_STORE_STORED_PAGE or RAW_STORE_ALLOC_PAGE.</p>
-<a name="N10040"></a><a name="pageheader"></a>
+<a name="N1005E"></a><a name="pageheader"></a>
 <h3 class="boxed"> Page Header </h3>
 <p> The page header is a fixed size, 56 bytes. </p>
-<table class="ForrestTable" cellspacing="1" cellpadding="4"> 
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+          
+<tr>
             
-<tr> 
-              
 <th colspan="1" rowspan="1">Size</th>
-              <th colspan="1" rowspan="1">Type</th>
-              <th colspan="1" rowspan="1">Description</th>
-            
+            <th colspan="1" rowspan="1">Type</th>
+            <th colspan="1" rowspan="1">Description</th>
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">1 byte</td>
-              <td colspan="1" rowspan="1">boolean</td>
-              <td colspan="1" rowspan="1">is page an overflow page</td>
-            
+            <td colspan="1" rowspan="1">boolean</td>
+            <td colspan="1" rowspan="1">is page an overflow page</td>
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">1 byte</td>
-              <td colspan="1" rowspan="1">byte</td>
-              <td colspan="1" rowspan="1">
+            <td colspan="1" rowspan="1">byte</td>
+            <td colspan="1" rowspan="1">
+              
 <p>page status is either VALID_PAGE or INVALID_PAGE(a field 
+
                   maintained in base page)</p>
-                
+              
 <p>page goes thru the following transition: 
+
                   <br>
+
                   VALID_PAGE &lt;-&gt; deallocated page -&gt; free page &lt;-&gt; 
+
                   VALID_PAGE</p>
-                
+              
 <p>deallocated and free page are both INVALID_PAGE as far as BasePage 
+
                   is concerned. 
+
                   <br>
+
                   When a page is deallocated, it transitioned from VALID_PAGE 
+
                   to INVALID_PAGE. 
+
                   <br>
+
                   When a page is allocated, it trnasitioned from INVALID_PAGE 
+
                   to VALID_PAGE.</p>
-</td>
             
+</td>
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">8 bytes</td>
-              <td colspan="1" rowspan="1">long</td>
-              <td colspan="1" rowspan="1">pageVersion (a field maintained in base page)</td>
-            
+            <td colspan="1" rowspan="1">long</td>
+            <td colspan="1" rowspan="1">pageVersion (a field maintained in base page)</td>
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">2 bytes</td>
-              <td colspan="1" rowspan="1">unsigned short</td>
-              <td colspan="1" rowspan="1">number of slots in slot offset table</td>
-            
+            <td colspan="1" rowspan="1">unsigned short</td>
+            <td colspan="1" rowspan="1">number of slots in slot offset table</td>
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">4 bytes</td>
-              <td colspan="1" rowspan="1">integer</td>
-              <td colspan="1" rowspan="1">next record identifier</td>
-            
+            <td colspan="1" rowspan="1">integer</td>
+            <td colspan="1" rowspan="1">next record identifier</td>
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">4 bytes</td>
-              <td colspan="1" rowspan="1">integer</td>
-              <td colspan="1" rowspan="1">generation number of this page (Future Use)</td>
-            
+            <td colspan="1" rowspan="1">integer</td>
+            <td colspan="1" rowspan="1">generation number of this page (Future Use)</td>
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">4 bytes</td>
-              <td colspan="1" rowspan="1">integer</td>
-              <td colspan="1" rowspan="1">previous generation of this page (Future Use)</td>
-            
+            <td colspan="1" rowspan="1">integer</td>
+            <td colspan="1" rowspan="1">previous generation of this page (Future Use)</td>
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">8 bytes</td>
-              <td colspan="1" rowspan="1">bipLocation</td>
-              <td colspan="1" rowspan="1">the location of the beforeimage page (Future Use)</td>
-            
+            <td colspan="1" rowspan="1">bipLocation</td>
+            <td colspan="1" rowspan="1">the location of the beforeimage page (Future Use)</td>
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">2 bytes</td>
-              <td colspan="1" rowspan="1">unsigned short</td>
-              <td colspan="1" rowspan="1">number of deleted rows on page. (new release 2.0)</td>
-            
+            <td colspan="1" rowspan="1">unsigned short</td>
+            <td colspan="1" rowspan="1">number of deleted rows on page. (new release 2.0)</td>
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">2 bytes</td>
-              <td colspan="1" rowspan="1">unsigned short</td>
-              <td colspan="1" rowspan="1">% of the page to keep free for updates</td>
-            
+            <td colspan="1" rowspan="1">unsigned short</td>
+            <td colspan="1" rowspan="1">% of the page to keep free for updates</td>
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">2 bytes</td>
-              <td colspan="1" rowspan="1">short</td>
-              <td colspan="1" rowspan="1">spare for future use</td>
-            
+            <td colspan="1" rowspan="1">short</td>
+            <td colspan="1" rowspan="1">spare for future use</td>
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">4 bytes</td>
-              <td colspan="1" rowspan="1">long</td>
-              <td colspan="1" rowspan="1">spare for future use (encryption uses to write random bytes 
+            <td colspan="1" rowspan="1">long</td>
+            <td colspan="1" rowspan="1">spare for future use (encryption uses to write random bytes 
+
                 here).</td>
-            
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">8 bytes</td>
-              <td colspan="1" rowspan="1">long</td>
-              <td colspan="1" rowspan="1">spare for future use</td>
-            
+            <td colspan="1" rowspan="1">long</td>
+            <td colspan="1" rowspan="1">spare for future use</td>
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">8 bytes</td>
-              <td colspan="1" rowspan="1">long</td>
-              <td colspan="1" rowspan="1">spare for future use</td>
-            
-</tr>
+            <td colspan="1" rowspan="1">long</td>
+            <td colspan="1" rowspan="1">spare for future use</td>
           
+</tr>
+        
 </table>
 <div class="frame note">
 <div class="label">Note</div>
 <div class="content">Spare space is guaranteed to be writen with "0", so that future 
+
             use of field should not either not use "0" as a valid data item or 
+
             pick 0 as a valid default value so that on the fly upgrade can assume 
+
             that 0 means field was never assigned. </div>
 </div>
-<a name="N1016B"></a><a name="records"></a>
+<a name="N1018B"></a><a name="records"></a>
 <h3 class="boxed"> Records </h3>
 <p>The records section contains zero or more records. Each record starts 
+
             with a Record Header</p>
-<table class="ForrestTable" cellspacing="1" cellpadding="4"> 
-            
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+          
 <caption>Record Header</caption>
+          
+<tr>
             
-<tr> 
-              
 <th colspan="1" rowspan="1">Type</th>
-              <th colspan="1" rowspan="1">Description</th>
-            
+            <th colspan="1" rowspan="1">Description</th>
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">1 byte</td>
-              <td colspan="1" rowspan="1"> 
+            <td colspan="1" rowspan="1">
+              
 <p>Status bits for the record header</p>
+              
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
                 
-<table class="ForrestTable" cellspacing="1" cellpadding="4"> 
+<tr>
                   
-<tr> 
-                    
 <td colspan="1" rowspan="1">RECORD_INITIAL</td>
-                    <td colspan="1" rowspan="1">used when record header is first initialized</td>
-                  
+                  <td colspan="1" rowspan="1">used when record header is first initialized</td>
+                
 </tr>
+                
+<tr>
                   
-<tr> 
-                    
 <td colspan="1" rowspan="1">RECORD_DELETED</td>
-                    <td colspan="1" rowspan="1">used to indicate the record has been deleted</td>
-                  
+                  <td colspan="1" rowspan="1">used to indicate the record has been deleted</td>
+                
 </tr>
+                
+<tr>
                   
-<tr> 
-                    
 <td colspan="1" rowspan="1">RECORD_OVERFLOW</td>
-                    <td colspan="1" rowspan="1">used to indicate the record has been overflowed, it will 
+                  <td colspan="1" rowspan="1">used to indicate the record has been overflowed, it will 
+
                       point to the overflow page and ID</td>
-                  
+                
 </tr>
+                
+<tr>
                   
-<tr> 
-                    
 <td colspan="1" rowspan="1">RECORD_HAS_FIRST_FIELD</td>
-                    <td colspan="1" rowspan="1">used to indicate that firstField is stored will be stored. 
+                  <td colspan="1" rowspan="1">used to indicate that firstField is stored will be stored. 
+
                       When RECORD_OVERFLOW and RECORD_HAS_FIRST_FIELD both are 
+
                       set, part of record is on the page, the record header also 
+
                       stores the overflow point to the next part of the record.</td>
-                  
+                
 </tr>
+                
+<tr>
                   
-<tr> 
-                    
 <td colspan="1" rowspan="1">RECORD_VALID_MASK</td>
-                    <td colspan="1" rowspan="1">A mask of valid bits that can be set currently, such that 
+                  <td colspan="1" rowspan="1">A mask of valid bits that can be set currently, such that 
+
                       the following assert can be made: </td>
-                  
-</tr>
                 
+</tr>
+              
 </table>
-</td>
             
+</td>
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">compressed int</td>
-              <td colspan="1" rowspan="1">record identifier</td>
-            
+            <td colspan="1" rowspan="1">record identifier</td>
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">compressed long</td>
-              <td colspan="1" rowspan="1">overflow page only if RECORD_OVERFLOW is set</td>
-            
+            <td colspan="1" rowspan="1">overflow page only if RECORD_OVERFLOW is set</td>
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">compressed int</td>
-              <td colspan="1" rowspan="1">overflow id only if RECORD_OVERFLOW is set</td>
-            
+            <td colspan="1" rowspan="1">overflow id only if RECORD_OVERFLOW is set</td>
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">compressed int</td>
-              <td colspan="1" rowspan="1">first field only if RECORD_HAS_FIRST_FIELD is set - otherwise 
+            <td colspan="1" rowspan="1">first field only if RECORD_HAS_FIRST_FIELD is set - otherwise 
+
                 0</td>
-            
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">compressed int</td>
-              <td colspan="1" rowspan="1">number of fields in this portion - only if RECORD_OVERFLOW is 
+            <td colspan="1" rowspan="1">number of fields in this portion - only if RECORD_OVERFLOW is 
+
                 false OR RECORD_HAS_FIRST_FIELD is true - otherwise 0</td>
-            
-</tr>
           
+</tr>
+        
 </table>
 <div class="frame note">
 <div class="label">Long Rows</div>
 <div class="content"> A row is long if all of it's columns can't fit on a single page. 
+
             When storing a long row, the segment of the row which fits on the 
+
             page is left there, and a pointer column is added at the end of the 
+
             row. It points to another row in the same container on a different 
+
             page. That row will contain the next set of columns and a continuation 
+
             pointer if necessary. The overflow portion will be on an "overflow" 
+
             page, and that page may have overflow portions of other rows on it 
+
             (unlike overflow columns). </div>
 </div>
 <p>The Record Header is followed by one or more fields. Each field contains 
+
             a Field Header and optional Field Data.</p>
-<table class="ForrestTable" cellspacing="1" cellpadding="4"> 
-            
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+          
 <caption>Stored Field Header Format</caption>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">status</td>
-              <td colspan="1" rowspan="1"> 
+            <td colspan="1" rowspan="1">
+              
 <p> The status is 1 byte, it indicates the state of the field. 
+
                   A FieldHeader can be in the following states: </p>
+              
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+                
+<tr>
                   
-<table class="ForrestTable" cellspacing="1" cellpadding="4"> 
-                    
-<tr> 
-                      
 <td colspan="1" rowspan="1">NULL</td>
-                      <td colspan="1" rowspan="1">if the field is NULL, no field data length is stored</td>
-                    
+                  <td colspan="1" rowspan="1">if the field is NULL, no field data length is stored</td>
+                
 </tr>
-                    
-<tr> 
-                      
+                
+<tr>
+                  
 <td colspan="1" rowspan="1">OVERFLOW</td>
-                      <td colspan="1" rowspan="1">indicates the field has been overflowed to another page. 
+                  <td colspan="1" rowspan="1">indicates the field has been overflowed to another page. 
+
                         overflow page and overflow ID is stored at the end of 
+
                         the user data. field data length must be a number greater 
+
                         or equal to 0, indicating the length of the field that 
+
                         is stored on the current page. The format looks like this: 
+
                         <img alt="" src="field-header-overflow.png">
+
                         overflowPage will be written as compressed long, overflowId 
+
                         will be written as compressed Int</td>
-                    
+                
 </tr>
-                    
-<tr> 
-                      
+                
+<tr>
+                  
 <td colspan="1" rowspan="1">NONEXISTENT</td>
-                      <td colspan="1" rowspan="1">the field no longer exists, e.g. column has been dropped 
+                  <td colspan="1" rowspan="1">the field no longer exists, e.g. column has been dropped 
+
                         during an alter table</td>
-                    
+                
 </tr>
-                    
-<tr> 
-                      
+                
+<tr>
+                  
 <td colspan="1" rowspan="1">EXTENSIBLE</td>
-                      <td colspan="1" rowspan="1">the field is of user defined data type. The field may 
+                  <td colspan="1" rowspan="1">the field is of user defined data type. The field may 
+
                         be tagged.</td>
-                    
+                
 </tr>
-                    
-<tr> 
-                      
+                
+<tr>
+                  
 <td colspan="1" rowspan="1">TAGGED</td>
-                      <td colspan="1" rowspan="1">the field is TAGGED if and only if it is EXTENSIBLE.</td>
-                    
+                  <td colspan="1" rowspan="1">the field is TAGGED if and only if it is EXTENSIBLE.</td>
+                
 </tr>
-                    
-<tr> 
-                      
+                
+<tr>
+                  
 <td colspan="1" rowspan="1">FIXED</td>
-                      <td colspan="1" rowspan="1">the field is FIXED if and only if it is used in the 
+                  <td colspan="1" rowspan="1">the field is FIXED if and only if it is used in the 
+
                         log records for version 1.2 and higher.</td>
-                    
+                
 </tr>
-                  
+              
 </table>
-                
-</td>
             
+</td>
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">fieldDataLength</td>
-              <td colspan="1" rowspan="1"> The fieldDataLength is only set if the field is not NULL. It 
+            <td colspan="1" rowspan="1"> The fieldDataLength is only set if the field is not NULL. It 
+
                 is the length of the field that is stored on the current page. 
+
                 The fieldDataLength is a variable length CompressedInt. </td>
-            
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">fieldData</td>
-              <td colspan="1" rowspan="1">
+            <td colspan="1" rowspan="1">
+              
 <p> Overflow page and overflow id are stored as field data. If 
+
                 the overflow bit in status is set, the field data is the overflow 
+
                 information. When the overflow bit is not set in status, then, 
+
                 fieldData is the actually user data for the field. That means, 
+
                 field header consists only field status, and field data length. 
+
                 <br>
+
                 A non-overflow field: 
-                <br> 
-<img alt="" src="field-header-non-overflow.png"> <br>
+
+                <br>
+<img alt="" src="field-header-non-overflow.png"><br>
+
                 An overflow field: 
-                <br> 
-<img alt="" src="field-header-overflow.png"> <br> 
+
+                <br>
+<img alt="" src="field-header-overflow.png"><br>
 <strong>overflowPage 
-                  and overflowID</strong> 
+
+                  and overflowID</strong>
 <br>
+
                 The overflowPage is a variable length CompressedLong, overflowID 
+
                 is a variable Length CompressedInt. They are only stored when 
+
                 the field state is OVERFLOW. And they are not stored in the field 
+
                 header. Instead, they are stored at the end of the field data. 
+
                 The reason we do that is to save a copy if the field has to overflow. </p>
-              
-</td>
             
-</tr>
+</td>
           
+</tr>
+        
 </table>
 <div class="frame note">
 <div class="label">Long Columns</div>
 <div class="content"> A column is long if it can't fit on a single page. A long column 
+
             is marked as long in the base row, and it's field contains a pointer 
+
             to a chain of other rows in the same container with contain the data 
+
             of the row. Each of the subsequent rows is on a page to itself. Each 
+
             subsquent row, except for the last piece has 2 columns, the first 
+
             is the next segment of the row and the second is the pointer to the 
+
             the following segment. The last segment only has the data segment. 
+
           </div>
 </div>
-<a name="N102C5"></a><a name="slottable"></a>
+<a name="N102E1"></a><a name="slottable"></a>
 <h3 class="boxed">Slot Offset Table</h3>
 <p>The slot offset table is a table of 6 or 12 bytes per record, depending 
+
           on the pageSize being less or greater than 64K: </p>
-<table class="ForrestTable" cellspacing="1" cellpadding="4"> 
-            
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+          
 <caption>Slot Table Record</caption>
+          
+<tr>
             
-<tr> 
-              
 <th colspan="1" rowspan="1">Size</th>
-              <th colspan="1" rowspan="1">Content</th>
-            
+            <th colspan="1" rowspan="1">Content</th>
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">2 bytes (unsigned short) or 4 bytes (int)</td>
-              <td colspan="1" rowspan="1">page offset for the record that is assigned to the slot</td>
-            
+            <td colspan="1" rowspan="1">page offset for the record that is assigned to the slot</td>
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">2 bytes (unsigned short) or 4 bytes (int)</td>
-              <td colspan="1" rowspan="1">the length of the record on this page.</td>
-            
+            <td colspan="1" rowspan="1">the length of the record on this page.</td>
+          
 </tr>
+          
+<tr>
             
-<tr> 
-              
 <td colspan="1" rowspan="1">2 bytes (unsigned short) or 4 bytes (int)</td>
-              <td colspan="1" rowspan="1">the length of the reserved number of bytes for this record on 
+            <td colspan="1" rowspan="1">the length of the reserved number of bytes for this record on 
+
                 this page.</td>
-            
-</tr>
           
+</tr>
+        
 </table>
 <p>
+
           First slot is slot 0. The slot table grows backwards. Slots are never 
+
           left empty. </p>
-<a name="N1030C"></a><a name="checksum"></a>
+<a name="N10328"></a><a name="checksum"></a>
 <h3 class="boxed">Checksum</h3>
 <p>8 bytes of a java.util.zip.CRC32 checksum of the entire's page contents 
+
           without the 8 bytes representing the checksum.</p>
 </div>
     
-<a name="N10317"></a><a name="allocpage"></a>
+<a name="N10333"></a><a name="allocpage"></a>
 <h2 class="boxed">Allocation Page</h2>
 <div class="section">
 <p> An allocation page of the file container extends a normal Stored page, 
+
         with the exception that a hunk of space may be 'borrowed' by the file 
+
         container to store the file header.</p>
 <p> The borrowed space is not visible to the alloc page even though it is 
+
         present in the page data array. It is accessed directly by the FileContainer. 
+
         Any change made to the borrowed space is not managed or seen by the allocation 
+
         page.</p>
 <p> The reason for having this borrowed space is so that the container header 
+
         does not need to have a page of its own. </p>
-<p> 
+<p>
         
 <strong>Page Format</strong>
         
 <br>
+
         An allocation page extends a stored page, the on disk format is different 
+
         from a stored page in that N bytes are 'borrowed' by the container and 
+
         the page header of an allocation page will be slightly bigger than a normal 
+
         stored page. This N bytes are stored between the page header and the record 
+
         space.</p>
 <p> The reason why this N bytes can't simply be a row is because it needs 
+
         to be statically accessible by the container object to avoid a chicken 
+
         and egg problem of the container object needing to instantiate an alloc 
+
         page object before it can be objectified, and an alloc page object needing 
+
         to instantiate a container object before it can be objectified. So this 
+
         N bytes must be stored outside of the normal record interface yet it must 
+
         be settable because only the first alloc page has this borrowed space. 
+
         Other (non-first) alloc page have N == 0. 
+
         <br>
+<img alt="" src="alloc-page.png"></p>
+<p>
+
+	N is a byte that indicates the size of the borrowed space.  Once an alloc
+
+	page is initialized, the value of N cannot change.
+
+	</p>
+<p>
+
+	The maximum space that can be borrowed by the container is 256 bytes.
+
+      </p>
+<p>
+
+	The allocation pages are of the same page size as any other pages in the
+
+	container. The first allocation page of the FileContainer starts at the
+
+	first physical byte of the container.  Subsequent allocation pages are
+
+	chained via the nextAllocPageOffset.  Each allocation page is expected to
+
+	manage at least 1000 user pages (for 1K page size) so this chaining may not
+
+	be a severe performance hit.  The logical -&gt; physical mapping of an
+
+	allocation page is stored in the previous allocation page.  The container
+
+	object will need to maintain this mapping.</p>
+<p>
+
+	The following fields are stored in the page header:
+
+      </p>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+        
+<caption>
+
+                Format of Alloc Page
+
+            </caption>
+        
+<tr>
+          
+<th colspan="1" rowspan="1">
+
+                Type
+
+             </th>
+          <th colspan="1" rowspan="1">
+
+                Description
+
+            </th>
+        
+</tr>
+        
+<tr>
+          
+<td colspan="1" rowspan="1">
+
+                int
+
+            </td>
+          <td colspan="1" rowspan="1">
+
+                FormatId
+
+            </td>
+        
+</tr>
+        
+<tr>
+          
+<td colspan="1" rowspan="1">StoredPageHeader</td>
+          <td colspan="1" rowspan="1">see <a href="#storedpage">Stored Page Header</a></td>
+        
+</tr>
+        
+<tr>
+          
+<td colspan="1" rowspan="1">long</td>
+          <td colspan="1" rowspan="1">nextAllocPageNumber - the next allocation page's number</td>
+        
+</tr>
+        
+<tr>
+          
+<td colspan="1" rowspan="1">long</td>
+          <td colspan="1" rowspan="1">nextAllocPageOffset - the file offset of the next allocation page</td>
+        
+</tr>
+        
+<tr>
+          
+<td colspan="1" rowspan="1">long</td>
+          <td colspan="1" rowspan="1">reserved1 - reserved for future usage</td>
+        
+</tr>
+        
+<tr>
+          
+<td colspan="1" rowspan="1">long</td>
+          <td colspan="1" rowspan="1">reserved2 - reserved for future usage</td>
+        
+</tr>
+        
+<tr>
+          
+<td colspan="1" rowspan="1">long</td>
+          <td colspan="1" rowspan="1">reserved3 - reserved for future usage</td>
+        
+</tr>
+        
+<tr>
+          
+<td colspan="1" rowspan="1">long</td>
+          <td colspan="1" rowspan="1">reserved4 - reserved for future usage</td>
+        
+</tr>
         
-<img alt="" src="alloc-page.png">
+<tr>
+          
+<td colspan="1" rowspan="1">byte</td>
+          <td colspan="1" rowspan="1">N - the size of the borrowed container info</td>
+        
+</tr>
+        
+<tr>
+          
+<td colspan="1" rowspan="1">byte[N]</td>
+          <td colspan="1" rowspan="1">containerInfo - the content of the borrowed container info</td>
+        
+</tr>
+        
+<tr>
+          
+<td colspan="1" rowspan="1">AllocExtent</td>
+          <td colspan="1" rowspan="1">The one and only extent on this alloc page.</td>
+        
+</tr>
+      
+</table>
+<p>
+
+	The allocation page contains allocation extent rows.  In this first cut
+
+	implementation, there is only 1 allocation extent row per allocation page.
+
+	</p>
+<p>
+
+	The allocation extent row is an externalizable object and is directly
+
+	written on to the page by the alloc page.  In other words, it will not be
+
+	converted in to a storeableRow.  This is to cut down overhead, enhance
+
+	performance and gives more control of the size and layout of the allocation
+
+	extent row to the alloc page.
+
+	</p>
+<a name="N10408"></a><a name="%0A%0A%09Alloc+Page+detailed+implementation+notes"></a>
+<h3 class="boxed">
+
+	Alloc Page detailed implementation notes</h3>
+<p>
+
+	Create Container - an embryonic allocation page is formatted on disk by a
+
+	special static function to avoid instantiating a full AllocPage object.
+
+	This embryonic allocation has enough information that it can find the
+
+	file header and not much else.  Then the allocation page is properly
+
+	initialized by creating the first extent.
+
+      </p>
+<p>
+
+	Open Container - A static AllocPage method will be used to read off the
+
+	container information directly from disk.  Even if
+
+	the first alloc page (page 0) is already in the page cache, it will not be
+
+	used because cleaning the alloc page will introduce a deadlock if the
+
+	container is not in the container cache.  Long term, the first alloc page
+
+	should probably live in the container cache rather than in the page cache.
+
+      </p>
+<p>
+
+	Get Page - The first alloc page (page 0) will be read into the page cache.
+
+	Continue to follow the alloc page chain until the alloc page that manages
+
+	the specified page is found.  From the alloc page, the physical offset of
+
+	the specified page is located.
+
+      </p>
+<p>
+
+	Cleaning alloc page - the alloc page is written out the same way any page
+
+	is written out.  The container object will provide a call back to the alloc
+
+	page to write the current version of the container object back into the
+
+	borrowed space before the alloc page itself is written out.
+
+	</p>
+<p>
+
+	Cleaning the container object - get the the first alloc page, dirty it and
+
+	clean it (which will cause it to call the container object to write itself
+
+	out into the borrowed space).  The versioning of the container is
+
+	independent of the versioning of the alloc page.  The container version is
+
+	stored inside the borrowed space and is opaque to the alloc page.
+
+	</p>
+<p>For the fields in an allocation extent row.</p>
+</div>
+    
+<a name="N10422"></a><a name="Allocation+Extent"></a>
+<h2 class="boxed">Allocation Extent</h2>
+<div class="section">
+<p>
+
+	An allocation extent row manages the page status of page in the extent.
+
+	AllocExtent is externalizable and is written to the AllocPage directly,
+
+	without being converted to a row first.
+
+	</p>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+        
+<caption>Format of Allocation Extent</caption>
+        
+<tr>
+          
+<th colspan="1" rowspan="1">Type</th>
+          <th colspan="1" rowspan="1">Description</th>
+        
+</tr>
+        
+<tr>
+          
+<td colspan="1" rowspan="1">long</td>
+          <td colspan="1" rowspan="1">extentOffset - the begin physical byte offset of the first page of this extent</td>
+        
+</tr>
+        
+<tr>
+          
+<td colspan="1" rowspan="1">long</td>
+          <td colspan="1" rowspan="1">extentStart - the first logical page mananged by this extent.</td>
+        
+</tr>
+        
+<tr>
+          
+<td colspan="1" rowspan="1">long</td>
+          <td colspan="1" rowspan="1">extentEnd - the last page this extent can ever hope to manage.</td>
+        
+</tr>
+        
+<tr>
+          
+<td colspan="1" rowspan="1">int</td>
+          <td colspan="1" rowspan="1">extentLength - the number of pages allocated in this extent</td>
+        
+</tr>
+        
+<tr>
+          
+<td colspan="1" rowspan="1">int</td>
+          <td colspan="1" rowspan="1">
+            
+<p>extentStatus - status bits for the whole extent.
+
+				<br>HAS_DEALLOCATED - most likely, this extent has a deallocated 
+
+                        page somewhere. If !HAD_DEALLOCATED, the extent has no deallocated page.
+
+				<br>HAS_FREE - most likely, this extent has a free page somewhere.
+
+						If !HAS_FREE, there is no free page in the extent.
+
+				<br>ALL_FREE - most likely, this extent only has free pages, good 
+
+                        candidate for shrinking the file.
+
+						If !ALL_FREE, the extent is not all free.
+
+				<br>HAS_UNFILLED_PAGES - most likely, this extent has unfilled pages.
+
+						if !HAS_UNFILLED_PAGES, all pages are filled.
+
+				<br>KEEP_UNFILLED_PAGES - this extent keeps track of unfilled pages
+
+						(post v1.3).  If not set, this extent has no notion of
+
+						unfilled page and has no unFilledPage bitmap.
+
+				<br>NO_DEALLOC_PAGE_MAP - this extents do not have a dealloc and a
+
+						free page bit maps.  Prior to 2.0, there are 2 bit
+
+						maps, a deallocate page bit map and a free page bit
+
+						map.  Cloudscape 2.0 and later merged the dealloc page
+
+						bit map into the free page bit map.
+
+				<br>RETIRED - this extent contains only 'retired' pages, never use 
+
+                        any page from this extent.  The pages don't actually 
+
+                        exist, i.e., it maps to nothing (physicalOffset is 
+
+                        garbage).  The purpose of this extent is to blot out a 
+
+                        range of logical page numbers that no longer exists 
+
+                        for this container.  Use this to reuse a physical page
+
+                        when a logical page has exhausted all recordId or for
+
+                        logical pages that has been shrunk out.
+
+               </p>
+          
+</td>
+        
+</tr>
+        
+<tr>
+          
+<td colspan="1" rowspan="1">int</td>
+          <td colspan="1" rowspan="1">preAllocLength - the number of pages that have been preallocated</td>
+        
+</tr>
+        
+<tr>
+          
+<td colspan="1" rowspan="1">int</td>
+          <td colspan="1" rowspan="1">reserved1</td>
+        
+</tr>
+        
+<tr>
+          
+<td colspan="1" rowspan="1">long</td>
+          <td colspan="1" rowspan="1">reserved2 - reserved for future use</td>
+        
+</tr>
+        
+<tr>
+          
+<td colspan="1" rowspan="1">long</td>
+          <td colspan="1" rowspan="1">reserved3 - reserved for future use</td>
+        
+</tr>
+        
+<tr>
+          
+<td colspan="1" rowspan="1">FreePages(bit)</td>
+          <td colspan="1" rowspan="1">Bitmap of free pages. Bit[i] is ON if page is free for immediate (re)use.</td>
+        
+</tr>
+        
+<tr>
+          
+<td colspan="1" rowspan="1">unFilledPages(bit)</td>
+          <td colspan="1" rowspan="1">Bitmap of pages that have free space. Bit[i] is ON if page is likely to be &lt; 1/2 full.</td>
+        
+</tr>
+      
+</table>
+<p>
+
+		org.apache.derby.iapi.services.io.FormatableBitSet is used to store the bit map.  
+
+            FormatableBitSet is an externalizable class.
+
+         </p>
+<p>
+
+	A page can have the following logical state:
+
+	<br>Free - a page that is free to be used
+
+	<br>Valid - a page that is currently in use
+
+      </p>
+<p>
+
+	There is another type of transitional pages which pages that have been
+
+	allocated on disk but has not yet been used.  These pages are Free.
+
+	</p>
+<p>
+
+	Bit[K] freePages
+
+		Bit[i] is ON iff page i maybe free for reuse.  User must get the
+
+		dealloc page lock on the free page to make sure the transaction.
+
+	</p>
+<p>
+
+	K is the size of the bit array, it must be &gt;= length.
+
       </p>
 </div>
   

Added: incubator/derby/site/trunk/src/documentation/content/xdocs/papers/container-format.aart
URL: http://svn.apache.org/viewcvs/incubator/derby/site/trunk/src/documentation/content/xdocs/papers/container-format.aart?view=auto&rev=149475
==============================================================================
--- incubator/derby/site/trunk/src/documentation/content/xdocs/papers/container-format.aart (added)
+++ incubator/derby/site/trunk/src/documentation/content/xdocs/papers/container-format.aart Tue Feb  1 19:29:24 2005
@@ -0,0 +1,13 @@
++--------+
+| header |
++--------+
+| data   |
++--------+
+| data   |
++--------+
+| ...    |
++--------+
+| alloc  |
++--------+
+| data   |
++--------+

Propchange: incubator/derby/site/trunk/src/documentation/content/xdocs/papers/container-format.aart
------------------------------------------------------------------------------
    svn:eol-style = native

Modified: incubator/derby/site/trunk/src/documentation/content/xdocs/papers/pageformats.xml
URL: http://svn.apache.org/viewcvs/incubator/derby/site/trunk/src/documentation/content/xdocs/papers/pageformats.xml?view=diff&r1=149474&r2=149475
==============================================================================
--- incubator/derby/site/trunk/src/documentation/content/xdocs/papers/pageformats.xml (original)
+++ incubator/derby/site/trunk/src/documentation/content/xdocs/papers/pageformats.xml Tue Feb  1 19:29:24 2005
@@ -1,372 +1,855 @@
 <?xml version="1.0"?>
 <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
-<document> 
-  <header> 
+<document>
+  <header>
     <title>Derby On Disk Page Format</title>
     <abstract>This document describes the storage format of Derby disk pages. 
+
     This is a work-in-progress derived from Javadoc comments and 
+
     from explanations Mike Matrigali posted to the Derby lists. 
+
     Please post questions, comments, and corrections to 
+
     derby-dev@db.apache.org.
+
     </abstract>
   </header>
   <body>
-
-
-    <section id="introduction"> 
+    <section id="introduction">
       <title> Introduction </title>
       <p>Derby stores table and index data in Containers, which currently map 
-        to files in the 
-        <code>seg0</code>
-        directory of the database. Data is stored in pages within the container.</p>
-      <fixme author="Dibyendu Majumdar"> Do all containers map to a single file, or does each container map 
-        to a file? </fixme>
+
+        to files in the <code>seg0</code>
+
+        directory of the database. In the current Derby implementation there is a a 1 to 1 mapping of
+
+        containers to files. Two containers never map to a single file and 1
+
+	  container never maps to multiple containers.</p>
+      <p>
+
+       Data is stored in pages within the container.</p>
       <p>A page contains a set of records, which can be accessed by "slot", which 
+
         defines the order of the records on the page, or by "id" which defines 
+
         the identity of the records on the page. Clients access records by both 
+
         slot and id, depending on their needs.</p>
-      <p>There are two types of pages - Raw Stored Pages which hold data, and 
-        Raw Stored Alloc Pages which hold page allocation information.</p>
       <p>A Table or a BTree index provides a row-based access mechanism (row-based 
+
         access interface is known as conglomerate). Rows are mapped to records 
-        in pages, in case of a table, a single row can span multiple records in 
+
+        in data pages; in case of a table, a single row can span multiple records in 
+
         multiple pages.</p>
+      <p>A container can have three types of pages:</p>
+      <ul>
+        <li>Header Page - which is just a specialized version of the Alloc Page.</li>
+        <li>Data Pages which hold data, and</li>
+        <li>Alloc Pages which hold page allocation information. An Alloc page is a specialized verion of the Data page.</li>
+      </ul>
+      <p>The container can be visualised as:<br/><img alt="" src="container-format.png"/></p>
+      <p>
+
+Header Page is currently always page 0 of the container.  It
+
+contains information that raw store needs to maintain about the
+
+container once per container, and is currently implemented as an Alloc
+
+Page which "borrows" space from the alloc page for it's information.
+
+The original decision was that the designers did not want to waste a whole page for
+
+header information, so a part of the page was used and the first allocation
+
+map was put on the second half of it. See <code>AllocPage.java</code> for info about layout and
+
+borrowing.
+
+</p>
+      <p>
+        <a href="#allocpage"> Allocation Page</a> - After page 0, all subsequent Allocation pages only
+
+have allocation bit maps.
+
+</p>
     </section>
-    <section id="storedpage"> 
+    <section id="storedpage">
       <title>Data Page Format</title>
       <p>A data page is broken into five sections. 
-        <img src="page-format.png" alt=""/>
-      </p>
-      <section id="formatid"> 
+
+        <img alt="" src="page-format.png"/></p>
+      <section id="formatid">
         <title>Format Id </title>
         <p> The formatId is a 4 bytes array, it contains the format Id of this 
+
           page. The possible values are RAW_STORE_STORED_PAGE or RAW_STORE_ALLOC_PAGE.</p>
       </section>
-      <section id="pageheader"> 
+      <section id="pageheader">
         <title> Page Header </title>
         <p> The page header is a fixed size, 56 bytes. </p>
-          <table> 
-            <tr> 
-              <th>Size</th>
-              <th>Type</th>
-              <th>Description</th>
-            </tr>
-            <tr> 
-              <td>1 byte</td>
-              <td>boolean</td>
-              <td>is page an overflow page</td>
-            </tr>
-            <tr> 
-              <td>1 byte</td>
-              <td>byte</td>
-              <td><p>page status is either VALID_PAGE or INVALID_PAGE(a field 
+        <table>
+          <tr>
+            <th>Size</th>
+            <th>Type</th>
+            <th>Description</th>
+          </tr>
+          <tr>
+            <td>1 byte</td>
+            <td>boolean</td>
+            <td>is page an overflow page</td>
+          </tr>
+          <tr>
+            <td>1 byte</td>
+            <td>byte</td>
+            <td>
+              <p>page status is either VALID_PAGE or INVALID_PAGE(a field 
+
                   maintained in base page)</p>
-                <p>page goes thru the following transition: 
+              <p>page goes thru the following transition: 
+
                   <br/>
+
                   VALID_PAGE &lt;-&gt; deallocated page -&gt; free page &lt;-&gt; 
+
                   VALID_PAGE</p>
-                <p>deallocated and free page are both INVALID_PAGE as far as BasePage 
+              <p>deallocated and free page are both INVALID_PAGE as far as BasePage 
+
                   is concerned. 
+
                   <br/>
+
                   When a page is deallocated, it transitioned from VALID_PAGE 
+
                   to INVALID_PAGE. 
+
                   <br/>
+
                   When a page is allocated, it trnasitioned from INVALID_PAGE 
-                  to VALID_PAGE.</p></td>
-            </tr>
-            <tr> 
-              <td>8 bytes</td>
-              <td>long</td>
-              <td>pageVersion (a field maintained in base page)</td>
-            </tr>
-            <tr> 
-              <td>2 bytes</td>
-              <td>unsigned short</td>
-              <td>number of slots in slot offset table</td>
-            </tr>
-            <tr> 
-              <td>4 bytes</td>
-              <td>integer</td>
-              <td>next record identifier</td>
-            </tr>
-            <tr> 
-              <td>4 bytes</td>
-              <td>integer</td>
-              <td>generation number of this page (Future Use)</td>
-            </tr>
-            <tr> 
-              <td>4 bytes</td>
-              <td>integer</td>
-              <td>previous generation of this page (Future Use)</td>
-            </tr>
-            <tr> 
-              <td>8 bytes</td>
-              <td>bipLocation</td>
-              <td>the location of the beforeimage page (Future Use)</td>
-            </tr>
-            <tr> 
-              <td>2 bytes</td>
-              <td>unsigned short</td>
-              <td>number of deleted rows on page. (new release 2.0)</td>
-            </tr>
-            <tr> 
-              <td>2 bytes</td>
-              <td>unsigned short</td>
-              <td>% of the page to keep free for updates</td>
-            </tr>
-            <tr> 
-              <td>2 bytes</td>
-              <td>short</td>
-              <td>spare for future use</td>
-            </tr>
-            <tr> 
-              <td>4 bytes</td>
-              <td>long</td>
-              <td>spare for future use (encryption uses to write random bytes 
+
+                  to VALID_PAGE.</p>
+            </td>
+          </tr>
+          <tr>
+            <td>8 bytes</td>
+            <td>long</td>
+            <td>pageVersion (a field maintained in base page)</td>
+          </tr>
+          <tr>
+            <td>2 bytes</td>
+            <td>unsigned short</td>
+            <td>number of slots in slot offset table</td>
+          </tr>
+          <tr>
+            <td>4 bytes</td>
+            <td>integer</td>
+            <td>next record identifier</td>
+          </tr>
+          <tr>
+            <td>4 bytes</td>
+            <td>integer</td>
+            <td>generation number of this page (Future Use)</td>
+          </tr>
+          <tr>
+            <td>4 bytes</td>
+            <td>integer</td>
+            <td>previous generation of this page (Future Use)</td>
+          </tr>
+          <tr>
+            <td>8 bytes</td>
+            <td>bipLocation</td>
+            <td>the location of the beforeimage page (Future Use)</td>
+          </tr>
+          <tr>
+            <td>2 bytes</td>
+            <td>unsigned short</td>
+            <td>number of deleted rows on page. (new release 2.0)</td>
+          </tr>
+          <tr>
+            <td>2 bytes</td>
+            <td>unsigned short</td>
+            <td>% of the page to keep free for updates</td>
+          </tr>
+          <tr>
+            <td>2 bytes</td>
+            <td>short</td>
+            <td>spare for future use</td>
+          </tr>
+          <tr>
+            <td>4 bytes</td>
+            <td>long</td>
+            <td>spare for future use (encryption uses to write random bytes 
+
                 here).</td>
-            </tr>
-            <tr> 
-              <td>8 bytes</td>
-              <td>long</td>
-              <td>spare for future use</td>
-            </tr>
-            <tr> 
-              <td>8 bytes</td>
-              <td>long</td>
-              <td>spare for future use</td>
-            </tr>
-          </table>
-          <note>Spare space is guaranteed to be writen with "0", so that future 
+          </tr>
+          <tr>
+            <td>8 bytes</td>
+            <td>long</td>
+            <td>spare for future use</td>
+          </tr>
+          <tr>
+            <td>8 bytes</td>
+            <td>long</td>
+            <td>spare for future use</td>
+          </tr>
+        </table>
+        <note>Spare space is guaranteed to be writen with "0", so that future 
+
             use of field should not either not use "0" as a valid data item or 
+
             pick 0 as a valid default value so that on the fly upgrade can assume 
+
             that 0 means field was never assigned. </note>
-        
       </section>
-      <section id="records"> 
+      <section id="records">
         <title> Records </title>
-        
-          <p>The records section contains zero or more records. Each record starts 
+        <p>The records section contains zero or more records. Each record starts 
+
             with a Record Header</p>
-          <table> 
-            <caption>Record Header</caption>
-            <tr> 
-              <th>Type</th>
-              <th>Description</th>
-            </tr>
-            <tr> 
-              <td>1 byte</td>
-              <td> <p>Status bits for the record header</p>
-                <table> 
-                  <tr> 
-                    <td>RECORD_INITIAL</td>
-                    <td>used when record header is first initialized</td>
-                  </tr>
-                  <tr> 
-                    <td>RECORD_DELETED</td>
-                    <td>used to indicate the record has been deleted</td>
-                  </tr>
-                  <tr> 
-                    <td>RECORD_OVERFLOW</td>
-                    <td>used to indicate the record has been overflowed, it will 
+        <table>
+          <caption>Record Header</caption>
+          <tr>
+            <th>Type</th>
+            <th>Description</th>
+          </tr>
+          <tr>
+            <td>1 byte</td>
+            <td>
+              <p>Status bits for the record header</p>
+              <table>
+                <tr>
+                  <td>RECORD_INITIAL</td>
+                  <td>used when record header is first initialized</td>
+                </tr>
+                <tr>
+                  <td>RECORD_DELETED</td>
+                  <td>used to indicate the record has been deleted</td>
+                </tr>
+                <tr>
+                  <td>RECORD_OVERFLOW</td>
+                  <td>used to indicate the record has been overflowed, it will 
+
                       point to the overflow page and ID</td>
-                  </tr>
-                  <tr> 
-                    <td>RECORD_HAS_FIRST_FIELD</td>
-                    <td>used to indicate that firstField is stored will be stored. 
+                </tr>
+                <tr>
+                  <td>RECORD_HAS_FIRST_FIELD</td>
+                  <td>used to indicate that firstField is stored will be stored. 
+
                       When RECORD_OVERFLOW and RECORD_HAS_FIRST_FIELD both are 
+
                       set, part of record is on the page, the record header also 
+
                       stores the overflow point to the next part of the record.</td>
-                  </tr>
-                  <tr> 
-                    <td>RECORD_VALID_MASK</td>
-                    <td>A mask of valid bits that can be set currently, such that 
+                </tr>
+                <tr>
+                  <td>RECORD_VALID_MASK</td>
+                  <td>A mask of valid bits that can be set currently, such that 
+
                       the following assert can be made: </td>
-                  </tr>
-                </table></td>
-            </tr>
-            <tr> 
-              <td>compressed int</td>
-              <td>record identifier</td>
-            </tr>
-            <tr> 
-              <td>compressed long</td>
-              <td>overflow page only if RECORD_OVERFLOW is set</td>
-            </tr>
-            <tr> 
-              <td>compressed int</td>
-              <td>overflow id only if RECORD_OVERFLOW is set</td>
-            </tr>
-            <tr> 
-              <td>compressed int</td>
-              <td>first field only if RECORD_HAS_FIRST_FIELD is set - otherwise 
+                </tr>
+              </table>
+            </td>
+          </tr>
+          <tr>
+            <td>compressed int</td>
+            <td>record identifier</td>
+          </tr>
+          <tr>
+            <td>compressed long</td>
+            <td>overflow page only if RECORD_OVERFLOW is set</td>
+          </tr>
+          <tr>
+            <td>compressed int</td>
+            <td>overflow id only if RECORD_OVERFLOW is set</td>
+          </tr>
+          <tr>
+            <td>compressed int</td>
+            <td>first field only if RECORD_HAS_FIRST_FIELD is set - otherwise 
+
                 0</td>
-            </tr>
-            <tr> 
-              <td>compressed int</td>
-              <td>number of fields in this portion - only if RECORD_OVERFLOW is 
+          </tr>
+          <tr>
+            <td>compressed int</td>
+            <td>number of fields in this portion - only if RECORD_OVERFLOW is 
+
                 false OR RECORD_HAS_FIRST_FIELD is true - otherwise 0</td>
-            </tr>
-          </table>
-          <note label="Long Rows"> A row is long if all of it's columns can't fit on a single page. 
+          </tr>
+        </table>
+        <note label="Long Rows"> A row is long if all of it's columns can't fit on a single page. 
+
             When storing a long row, the segment of the row which fits on the 
+
             page is left there, and a pointer column is added at the end of the 
+
             row. It points to another row in the same container on a different 
+
             page. That row will contain the next set of columns and a continuation 
+
             pointer if necessary. The overflow portion will be on an "overflow" 
+
             page, and that page may have overflow portions of other rows on it 
+
             (unlike overflow columns). </note>
-          <p>The Record Header is followed by one or more fields. Each field contains 
+        <p>The Record Header is followed by one or more fields. Each field contains 
+
             a Field Header and optional Field Data.</p>
-          <table> 
-            <caption>Stored Field Header Format</caption>
-            <tr> 
-              <td>status</td>
-              <td> <p> The status is 1 byte, it indicates the state of the field. 
+        <table>
+          <caption>Stored Field Header Format</caption>
+          <tr>
+            <td>status</td>
+            <td>
+              <p> The status is 1 byte, it indicates the state of the field. 
+
                   A FieldHeader can be in the following states: </p>
-                  <table> 
-                    <tr> 
-                      <td>NULL</td>
-                      <td>if the field is NULL, no field data length is stored</td>
-                    </tr>
-                    <tr> 
-                      <td>OVERFLOW</td>
-                      <td>indicates the field has been overflowed to another page. 
+              <table>
+                <tr>
+                  <td>NULL</td>
+                  <td>if the field is NULL, no field data length is stored</td>
+                </tr>
+                <tr>
+                  <td>OVERFLOW</td>
+                  <td>indicates the field has been overflowed to another page. 
+
                         overflow page and overflow ID is stored at the end of 
+
                         the user data. field data length must be a number greater 
+
                         or equal to 0, indicating the length of the field that 
+
                         is stored on the current page. The format looks like this: 
-                        <img src="field-header-overflow.png" alt=""/>
+
+                        <img alt="" src="field-header-overflow.png"/>
+
                         overflowPage will be written as compressed long, overflowId 
+
                         will be written as compressed Int</td>
-                    </tr>
-                    <tr> 
-                      <td>NONEXISTENT</td>
-                      <td>the field no longer exists, e.g. column has been dropped 
+                </tr>
+                <tr>
+                  <td>NONEXISTENT</td>
+                  <td>the field no longer exists, e.g. column has been dropped 
+
                         during an alter table</td>
-                    </tr>
-                    <tr> 
-                      <td>EXTENSIBLE</td>
-                      <td>the field is of user defined data type. The field may 
+                </tr>
+                <tr>
+                  <td>EXTENSIBLE</td>
+                  <td>the field is of user defined data type. The field may 
+
                         be tagged.</td>
-                    </tr>
-                    <tr> 
-                      <td>TAGGED</td>
-                      <td>the field is TAGGED if and only if it is EXTENSIBLE.</td>
-                    </tr>
-                    <tr> 
-                      <td>FIXED</td>
-                      <td>the field is FIXED if and only if it is used in the 
+                </tr>
+                <tr>
+                  <td>TAGGED</td>
+                  <td>the field is TAGGED if and only if it is EXTENSIBLE.</td>
+                </tr>
+                <tr>
+                  <td>FIXED</td>
+                  <td>the field is FIXED if and only if it is used in the 
+
                         log records for version 1.2 and higher.</td>
-                    </tr>
-                  </table>
-                </td>
-            </tr>
-            <tr> 
-              <td>fieldDataLength</td>
-              <td> The fieldDataLength is only set if the field is not NULL. It 
+                </tr>
+              </table>
+            </td>
+          </tr>
+          <tr>
+            <td>fieldDataLength</td>
+            <td> The fieldDataLength is only set if the field is not NULL. It 
+
                 is the length of the field that is stored on the current page. 
+
                 The fieldDataLength is a variable length CompressedInt. </td>
-            </tr>
-            <tr> 
-              <td>fieldData</td>
-              <td><p> Overflow page and overflow id are stored as field data. If 
+          </tr>
+          <tr>
+            <td>fieldData</td>
+            <td>
+              <p> Overflow page and overflow id are stored as field data. If 
+
                 the overflow bit in status is set, the field data is the overflow 
+
                 information. When the overflow bit is not set in status, then, 
+
                 fieldData is the actually user data for the field. That means, 
+
                 field header consists only field status, and field data length. 
+
                 <br/>
+
                 A non-overflow field: 
-                <br/> <img src="field-header-non-overflow.png" alt=""/> <br/>
+
+                <br/><img alt="" src="field-header-non-overflow.png"/><br/>
+
                 An overflow field: 
-                <br/> <img src="field-header-overflow.png" alt=""/> <br/> <strong>overflowPage 
-                  and overflowID</strong> <br/>
+
+                <br/><img alt="" src="field-header-overflow.png"/><br/><strong>overflowPage 
+
+                  and overflowID</strong><br/>
+
                 The overflowPage is a variable length CompressedLong, overflowID 
+
                 is a variable Length CompressedInt. They are only stored when 
+
                 the field state is OVERFLOW. And they are not stored in the field 
+
                 header. Instead, they are stored at the end of the field data. 
+
                 The reason we do that is to save a copy if the field has to overflow. </p>
-              </td>
-            </tr>
-          </table>
-          <note label="Long Columns"> A column is long if it can't fit on a single page. A long column 
+            </td>
+          </tr>
+        </table>
+        <note label="Long Columns"> A column is long if it can't fit on a single page. A long column 
+
             is marked as long in the base row, and it's field contains a pointer 
+
             to a chain of other rows in the same container with contain the data 
+
             of the row. Each of the subsequent rows is on a page to itself. Each 
+
             subsquent row, except for the last piece has 2 columns, the first 
+
             is the next segment of the row and the second is the pointer to the 
+
             the following segment. The last segment only has the data segment. 
+
           </note>
-        
       </section>
-      <section id="slottable"> 
+      <section id="slottable">
         <title>Slot Offset Table</title>
         <p>The slot offset table is a table of 6 or 12 bytes per record, depending 
+
           on the pageSize being less or greater than 64K: </p>
-          <table> 
-            <caption>Slot Table Record</caption>
-            <tr> 
-              <th>Size</th>
-              <th>Content</th>
-            </tr>
-            <tr> 
-              <td>2 bytes (unsigned short) or 4 bytes (int)</td>
-              <td>page offset for the record that is assigned to the slot</td>
-            </tr>
-            <tr> 
-              <td>2 bytes (unsigned short) or 4 bytes (int)</td>
-              <td>the length of the record on this page.</td>
-            </tr>
-            <tr> 
-              <td>2 bytes (unsigned short) or 4 bytes (int)</td>
-              <td>the length of the reserved number of bytes for this record on 
+        <table>
+          <caption>Slot Table Record</caption>
+          <tr>
+            <th>Size</th>
+            <th>Content</th>
+          </tr>
+          <tr>
+            <td>2 bytes (unsigned short) or 4 bytes (int)</td>
+            <td>page offset for the record that is assigned to the slot</td>
+          </tr>
+          <tr>
+            <td>2 bytes (unsigned short) or 4 bytes (int)</td>
+            <td>the length of the record on this page.</td>
+          </tr>
+          <tr>
+            <td>2 bytes (unsigned short) or 4 bytes (int)</td>
+            <td>the length of the reserved number of bytes for this record on 
+
                 this page.</td>
-            </tr>
-          </table>
-		  <p>
+          </tr>
+        </table>
+        <p>
+
           First slot is slot 0. The slot table grows backwards. Slots are never 
+
           left empty. </p>
       </section>
-      <section id="checksum"> 
+      <section id="checksum">
         <title>Checksum</title>
         <p>8 bytes of a java.util.zip.CRC32 checksum of the entire's page contents 
+
           without the 8 bytes representing the checksum.</p>
       </section>
-	</section>
-    <section id="allocpage"> 
+    </section>
+    <section id="allocpage">
       <title>Allocation Page</title>
       <p> An allocation page of the file container extends a normal Stored page, 
+
         with the exception that a hunk of space may be 'borrowed' by the file 
+
         container to store the file header.</p>
       <p> The borrowed space is not visible to the alloc page even though it is 
+
         present in the page data array. It is accessed directly by the FileContainer. 
+
         Any change made to the borrowed space is not managed or seen by the allocation 
+
         page.</p>
       <p> The reason for having this borrowed space is so that the container header 
+
         does not need to have a page of its own. </p>
-      <p> 
+      <p>
         <strong>Page Format</strong>
         <br/>
+
         An allocation page extends a stored page, the on disk format is different 
+
         from a stored page in that N bytes are 'borrowed' by the container and 
+
         the page header of an allocation page will be slightly bigger than a normal 
+
         stored page. This N bytes are stored between the page header and the record 
+
         space.</p>
       <p> The reason why this N bytes can't simply be a row is because it needs 
+
         to be statically accessible by the container object to avoid a chicken 
+
         and egg problem of the container object needing to instantiate an alloc 
+
         page object before it can be objectified, and an alloc page object needing 
+
         to instantiate a container object before it can be objectified. So this 
+
         N bytes must be stored outside of the normal record interface yet it must 
+
         be settable because only the first alloc page has this borrowed space. 
+
         Other (non-first) alloc page have N == 0. 
-        <br/>
-        <img src="alloc-page.png" alt=""/>
+
+        <br/><img alt="" src="alloc-page.png"/></p>
+      <p>
+
+	N is a byte that indicates the size of the borrowed space.  Once an alloc
+
+	page is initialized, the value of N cannot change.
+
+	</p>
+      <p>
+
+	The maximum space that can be borrowed by the container is 256 bytes.
+
+      </p>
+      <p>
+
+	The allocation pages are of the same page size as any other pages in the
+
+	container. The first allocation page of the FileContainer starts at the
+
+	first physical byte of the container.  Subsequent allocation pages are
+
+	chained via the nextAllocPageOffset.  Each allocation page is expected to
+
+	manage at least 1000 user pages (for 1K page size) so this chaining may not
+
+	be a severe performance hit.  The logical -> physical mapping of an
+
+	allocation page is stored in the previous allocation page.  The container
+
+	object will need to maintain this mapping.</p>
+      <p>
+
+	The following fields are stored in the page header:
+
+      </p>
+      <table>
+        <caption>
+
+                Format of Alloc Page
+
+            </caption>
+        <tr>
+          <th>
+
+                Type
+
+             </th>
+          <th>
+
+                Description
+
+            </th>
+        </tr>
+        <tr>
+          <td>
+
+                int
+
+            </td>
+          <td>
+
+                FormatId
+
+            </td>
+        </tr>
+        <tr>
+          <td>StoredPageHeader</td>
+          <td>see <a href="#storedpage">Stored Page Header</a></td>
+        </tr>
+        <tr>
+          <td>long</td>
+          <td>nextAllocPageNumber - the next allocation page's number</td>
+        </tr>
+        <tr>
+          <td>long</td>
+          <td>nextAllocPageOffset - the file offset of the next allocation page</td>
+        </tr>
+        <tr>
+          <td>long</td>
+          <td>reserved1 - reserved for future usage</td>
+        </tr>
+        <tr>
+          <td>long</td>
+          <td>reserved2 - reserved for future usage</td>
+        </tr>
+        <tr>
+          <td>long</td>
+          <td>reserved3 - reserved for future usage</td>
+        </tr>
+        <tr>
+          <td>long</td>
+          <td>reserved4 - reserved for future usage</td>
+        </tr>
+        <tr>
+          <td>byte</td>
+          <td>N - the size of the borrowed container info</td>
+        </tr>
+        <tr>
+          <td>byte[N]</td>
+          <td>containerInfo - the content of the borrowed container info</td>
+        </tr>
+        <tr>
+          <td>AllocExtent</td>
+          <td>The one and only extent on this alloc page.</td>
+        </tr>
+      </table>
+      <p>
+
+	The allocation page contains allocation extent rows.  In this first cut
+
+	implementation, there is only 1 allocation extent row per allocation page.
+
+	</p>
+      <p>
+
+	The allocation extent row is an externalizable object and is directly
+
+	written on to the page by the alloc page.  In other words, it will not be
+
+	converted in to a storeableRow.  This is to cut down overhead, enhance
+
+	performance and gives more control of the size and layout of the allocation
+
+	extent row to the alloc page.
+
+	</p>
+      <section>
+        <title>
+
+	Alloc Page detailed implementation notes</title>
+        <p>
+
+	Create Container - an embryonic allocation page is formatted on disk by a
+
+	special static function to avoid instantiating a full AllocPage object.
+
+	This embryonic allocation has enough information that it can find the
+
+	file header and not much else.  Then the allocation page is properly
+
+	initialized by creating the first extent.
+
+      </p>
+        <p>
+
+	Open Container - A static AllocPage method will be used to read off the
+
+	container information directly from disk.  Even if
+
+	the first alloc page (page 0) is already in the page cache, it will not be
+
+	used because cleaning the alloc page will introduce a deadlock if the
+
+	container is not in the container cache.  Long term, the first alloc page
+
+	should probably live in the container cache rather than in the page cache.
+
+      </p>
+        <p>
+
+	Get Page - The first alloc page (page 0) will be read into the page cache.
+
+	Continue to follow the alloc page chain until the alloc page that manages
+
+	the specified page is found.  From the alloc page, the physical offset of
+
+	the specified page is located.
+
+      </p>
+        <p>
+
+	Cleaning alloc page - the alloc page is written out the same way any page
+
+	is written out.  The container object will provide a call back to the alloc
+
+	page to write the current version of the container object back into the
+
+	borrowed space before the alloc page itself is written out.
+
+	</p>
+        <p>
+
+	Cleaning the container object - get the the first alloc page, dirty it and
+
+	clean it (which will cause it to call the container object to write itself
+
+	out into the borrowed space).  The versioning of the container is
+
+	independent of the versioning of the alloc page.  The container version is
+
+	stored inside the borrowed space and is opaque to the alloc page.
+
+	</p>
+        <p>For the fields in an allocation extent row.</p>
+      </section>
+    </section>
+    <section>
+      <title>Allocation Extent</title>
+      <p>
+
+	An allocation extent row manages the page status of page in the extent.
+
+	AllocExtent is externalizable and is written to the AllocPage directly,
+
+	without being converted to a row first.
+
+	</p>
+      <table>
+        <caption>Format of Allocation Extent</caption>
+        <tr>
+          <th>Type</th>
+          <th>Description</th>
+        </tr>
+        <tr>
+          <td>long</td>
+          <td>extentOffset - the begin physical byte offset of the first page of this extent</td>
+        </tr>
+        <tr>
+          <td>long</td>
+          <td>extentStart - the first logical page mananged by this extent.</td>
+        </tr>
+        <tr>
+          <td>long</td>
+          <td>extentEnd - the last page this extent can ever hope to manage.</td>
+        </tr>
+        <tr>
+          <td>int</td>
+          <td>extentLength - the number of pages allocated in this extent</td>
+        </tr>
+        <tr>
+          <td>int</td>
+          <td>
+            <p>extentStatus - status bits for the whole extent.
+
+				<br/>HAS_DEALLOCATED - most likely, this extent has a deallocated 
+
+                        page somewhere. If !HAD_DEALLOCATED, the extent has no deallocated page.
+
+				<br/>HAS_FREE - most likely, this extent has a free page somewhere.
+
+						If !HAS_FREE, there is no free page in the extent.
+
+				<br/>ALL_FREE - most likely, this extent only has free pages, good 
+
+                        candidate for shrinking the file.
+
+						If !ALL_FREE, the extent is not all free.
+
+				<br/>HAS_UNFILLED_PAGES - most likely, this extent has unfilled pages.
+
+						if !HAS_UNFILLED_PAGES, all pages are filled.
+
+				<br/>KEEP_UNFILLED_PAGES - this extent keeps track of unfilled pages
+
+						(post v1.3).  If not set, this extent has no notion of
+
+						unfilled page and has no unFilledPage bitmap.
+
+				<br/>NO_DEALLOC_PAGE_MAP - this extents do not have a dealloc and a
+
+						free page bit maps.  Prior to 2.0, there are 2 bit
+
+						maps, a deallocate page bit map and a free page bit
+
+						map.  Cloudscape 2.0 and later merged the dealloc page
+
+						bit map into the free page bit map.
+
+				<br/>RETIRED - this extent contains only 'retired' pages, never use 
+
+                        any page from this extent.  The pages don't actually 
+
+                        exist, i.e., it maps to nothing (physicalOffset is 
+
+                        garbage).  The purpose of this extent is to blot out a 
+
+                        range of logical page numbers that no longer exists 
+
+                        for this container.  Use this to reuse a physical page
+
+                        when a logical page has exhausted all recordId or for
+
+                        logical pages that has been shrunk out.
+
+               </p>
+          </td>
+        </tr>
+        <tr>
+          <td>int</td>
+          <td>preAllocLength - the number of pages that have been preallocated</td>
+        </tr>
+        <tr>
+          <td>int</td>
+          <td>reserved1</td>
+        </tr>
+        <tr>
+          <td>long</td>
+          <td>reserved2 - reserved for future use</td>
+        </tr>
+        <tr>
+          <td>long</td>
+          <td>reserved3 - reserved for future use</td>
+        </tr>
+        <tr>
+          <td>FreePages(bit)</td>
+          <td>Bitmap of free pages. Bit[i] is ON if page is free for immediate (re)use.</td>
+        </tr>
+        <tr>
+          <td>unFilledPages(bit)</td>
+          <td>Bitmap of pages that have free space. Bit[i] is ON if page is likely to be &lt; 1/2 full.</td>
+        </tr>
+      </table>
+      <p>
+
+		org.apache.derby.iapi.services.io.FormatableBitSet is used to store the bit map.  
+
+            FormatableBitSet is an externalizable class.
+
+         </p>
+      <p>
+
+	A page can have the following logical state:
+
+	<br/>Free - a page that is free to be used
+
+	<br/>Valid - a page that is currently in use
+
+      </p>
+      <p>
+
+	There is another type of transitional pages which pages that have been
+
+	allocated on disk but has not yet been used.  These pages are Free.
+
+	</p>
+      <p>
+
+	Bit[K] freePages
+
+		Bit[i] is ON iff page i maybe free for reuse.  User must get the
+
+		dealloc page lock on the free page to make sure the transaction.
+
+	</p>
+      <p>
+
+	K is the size of the bit array, it must be >= length.
+
       </p>
     </section>
   </body>
-  <footer> 
-    <legal>This is a legal notice, so it is 
-      <strong>important</strong>
-      .</legal>
+  <footer>
+    <legal></legal>
   </footer>
 </document>



Mime
View raw message