directory-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From conflue...@apache.org
Subject [CONF] Apache Directory Server v1.5 > Master Table
Date Tue, 21 Jun 2011 13:30:00 GMT
<html>
<head>
    <base href="https://cwiki.apache.org/confluence">
            <link rel="stylesheet" href="/confluence/s/2042/9/1/_/styles/combined.css?spaceKey=DIRxSRVx11&amp;forWysiwyg=true"
type="text/css">
    </head>
<body style="background: white;" bgcolor="white" class="email-body">
<div id="pageContent">
<div id="notificationFormat">
<div class="wiki-content">
<div class="email">
    <h2><a href="https://cwiki.apache.org/confluence/display/DIRxSRVx11/Master+Table">Master
Table</a></h2>
    <h4>Page <b>edited</b> by             <a href="https://cwiki.apache.org/confluence/display/~elecharny">Emmanuel
L├ęcharny</a>
    </h4>
        <div id="versionComment">
        <b>Comment:</b>
        ta<br />
    </div>
        <br/>
                         <h4>Changes (11)</h4>
                                 
    
<div id="page-diffs">
                    <table class="diff" cellpadding="0" cellspacing="0">
    
            <tr><td class="diff-added-lines" style="background-color: #dfd;">h1.
Introduction <br>The *Master table* is the storage for LDAP entries. This is where we
store a serialized version of each entry added by all the LDAP users.. Every other tables
are indexes used to retrieve easily entries from this master table. <br> <br>This
table is absolutely critical if we want to rebuild the full database, if we have lost some
index. However, it requires some dedicated tool to do so, as we don&#39;t store each entry&#39;s
DN into the serialized entry : we only store the entry&#39;s RDN and it&#39;s parent&#39;s
ID. <br> <br>But basically, with only the MasterTable, plus a list of indexes
to build, we can rebuild the full database. <br> <br>Let&#39;s now see what&#39;s
the MasterTable made of. <br> <br></td></tr>
            <tr><td class="diff-unchanged" >h1. MasterTable Interface <br>
<br></td></tr>
            <tr><td class="diff-changed-lines" >Somewhere in a partition based
on <span class="diff-deleted-words"style="color:#999;background-color:#fdd;text-decoration:line-through;">two
column</span> <span class="diff-added-words"style="background-color: #dfd;">a
hierarchical</span> data <span class="diff-changed-words">structure<span class="diff-deleted-chars"style="color:#999;background-color:#fdd;text-decoration:line-through;">s
(i.e. BTree)</span>,</span> we need to store entry objects.  This table will contain
the entry objects with <span class="diff-added-words"style="background-color: #dfd;">all
yje</span> attributes and values as the user provided them during add and modify operations.
 The server may insert additional attributes such as operational attributes into these entries,
however all the original attributes provided by the user are as they were including the case
of their attribute identifiers. <br></td></tr>
            <tr><td class="diff-unchanged" > <br></td></tr>
            <tr><td class="diff-deleted-lines" style="color:#999;background-color:#fdd;text-decoration:line-through;">Such
a table stores entries with the Tuple key set to some unique identifier and the value set
to a serialized representation of the entry.  The identifier must be unique, right now it&#39;s
a long and is sequential.  <br></td></tr>
            <tr><td class="diff-added-lines" style="background-color: #dfd;">Such
a table stores entries with a key set to some unique identifier and the value set to a serialized
representation of the entry.  The identifier must be unique, right now it&#39;s a long
and is sequential (that means we are somehow &#39;limited&#39; to 2^64 - 1 entries).
 <br></td></tr>
            <tr><td class="diff-unchanged" > <br></td></tr>
            <tr><td class="diff-deleted-lines" style="color:#999;background-color:#fdd;text-decoration:line-through;">With
the new Entry API, entries serialized into such a table don&#39;t contain their DN in
the serialized entry. This means we have to rebuild it using the RDN index in order to return
the full Entry. The rest of the information in the partition can be derived from this master
copy.  <br></td></tr>
            <tr><td class="diff-added-lines" style="background-color: #dfd;">With
the new Entry API, entries serialized into such a table don&#39;t contain their DN in
the serialized entry. This means we have to rebuild it using the RDN stored in the entry and
the parentId we also stored into the entry to return the full Entry. The rest of the information
in the partition can be derived from this master table.  <br></td></tr>
            <tr><td class="diff-unchanged" > <br></td></tr>
            <tr><td class="diff-deleted-lines" style="color:#999;background-color:#fdd;text-decoration:line-through;">Hence
the MasterTable interface extends the Table interface to lookup entries by numeric identifiers.
Tuple keys are Long identifiers, and Tuple values are serialized ServerEntry objects. <br></td></tr>
            <tr><td class="diff-added-lines" style="background-color: #dfd;">Hence
the MasterTable interface extends the Table interface to lookup entries by numeric identifiers.
Each entry is stored and manipulated as a *Tuple*, which is a couple of data : the key and
the value, of which the keys are Long identifiers, and the values are serialized Entry objects.
<br></td></tr>
            <tr><td class="diff-unchanged" > <br>{note} <br>Using
Long as identifier has the advantage of being simple, but the problem is that we have to deal
with the sequential aspect : we need to store each new created number on the disk, which leads
to an extra disk access when creating an entry. Also this unique number will be local. We
think that using the entry UUID instead would be a better idea. <br></td></tr>
            <tr><td class="diff-added-lines" style="background-color: #dfd;">
<br>It also implies we have a lock on the incremental counter. Adding or deleting entries
are therefore costly operations. <br></td></tr>
            <tr><td class="diff-unchanged" >{note} <br> <br>The MasterTable
also contains a means to store simple key value pair properties associated with the partition.
 These many be used to store configuration settings or other bits of information.   <br>
<br></td></tr>
            <tr><td class="diff-changed-lines" >The MasterTable also exposes access
to a persistent sequence counter used to generate new identifiers when adding new entries
to the <span class="diff-changed-words">table<span class="diff-added-chars"style="background-color:
#dfd;"> (the Entry identifier, a Long)</span>.</span> <br></td></tr>
            <tr><td class="diff-unchanged" > <br> <br>h1. JDBM MasterTable
Implementation <br> <br></td></tr>
            <tr><td class="diff-changed-lines" >The JDBM based MasterTable implementation,
JdbmMasterTable, simply extends a JdbmTable and locks the Tuple keys to use Long objects.
 A custom Serializer is used to marshal Entry objects to and from this Table.  The implementation
uses a property for persisting the current value of it&#39;s <span class="diff-changed-words">sequen<span
class="diff-deleted-chars"style="color:#999;background-color:#fdd;text-decoration:line-through;">c</span><span
class="diff-added-chars"style="background-color: #dfd;">t</span>ial</span>
counter which starts at 1.  There is no entry with value 0. <br></td></tr>
            <tr><td class="diff-unchanged" > <br>The properties object is
another BTree contained in the master.db file.  The other BTree is the primary one used to
store Long identifiers mapped to serialized Entry bytes.  <br></td></tr>
            <tr><td class="diff-snipped" >...<br></td></tr>
    
            </table>
    </div>                            <h4>Full Content</h4>
                    <div class="notificationGreySide">
        <h1><a name="MasterTable-Introduction"></a>Introduction</h1>
<p>The <b>Master table</b> is the storage for LDAP entries. This is where
we store a serialized version of each entry added by all the LDAP users.. Every other tables
are indexes used to retrieve easily entries from this master table.</p>

<p>This table is absolutely critical if we want to rebuild the full database, if we
have lost some index. However, it requires some dedicated tool to do so, as we don't store
each entry's DN into the serialized entry : we only store the entry's RDN and it's parent's
ID.</p>

<p>But basically, with only the MasterTable, plus a list of indexes to build, we can
rebuild the full database.</p>

<p>Let's now see what's the MasterTable made of.</p>

<h1><a name="MasterTable-MasterTableInterface"></a>MasterTable Interface</h1>

<p>Somewhere in a partition based on a hierarchical data structure, we need to store
entry objects.  This table will contain the entry objects with all yje attributes and values
as the user provided them during add and modify operations.  The server may insert additional
attributes such as operational attributes into these entries, however all the original attributes
provided by the user are as they were including the case of their attribute identifiers. </p>

<p>Such a table stores entries with a key set to some unique identifier and the value
set to a serialized representation of the entry.  The identifier must be unique, right now
it's a long and is sequential (that means we are somehow 'limited' to 2^64 - 1 entries). </p>

<p>With the new Entry API, entries serialized into such a table don't contain their
DN in the serialized entry. This means we have to rebuild it using the RDN stored in the entry
and the parentId we also stored into the entry to return the full Entry. The rest of the information
in the partition can be derived from this master table. </p>

<p>Hence the MasterTable interface extends the Table interface to lookup entries by
numeric identifiers. Each entry is stored and manipulated as a <b>Tuple</b>, which
is a couple of data : the key and the value, of which the keys are Long identifiers, and the
values are serialized Entry objects.</p>

<div class='panelMacro'><table class='noteMacro'><colgroup><col width='24'><col></colgroup><tr><td
valign='top'><img src="/confluence/images/icons/emoticons/warning.gif" width="16" height="16"
align="absmiddle" alt="" border="0"></td><td>Using Long as identifier has the
advantage of being simple, but the problem is that we have to deal with the sequential aspect
: we need to store each new created number on the disk, which leads to an extra disk access
when creating an entry. Also this unique number will be local. We think that using the entry
UUID instead would be a better idea.

<p>It also implies we have a lock on the incremental counter. Adding or deleting entries
are therefore costly operations.</p></td></tr></table></div>

<p>The MasterTable also contains a means to store simple key value pair properties associated
with the partition.  These many be used to store configuration settings or other bits of information.
 </p>

<p>The MasterTable also exposes access to a persistent sequence counter used to generate
new identifiers when adding new entries to the table (the Entry identifier, a Long).   </p>


<h1><a name="MasterTable-JDBMMasterTableImplementation"></a>JDBM MasterTable
Implementation</h1>

<p>The JDBM based MasterTable implementation, JdbmMasterTable, simply extends a JdbmTable
and locks the Tuple keys to use Long objects.  A custom Serializer is used to marshal Entry
objects to and from this Table.  The implementation uses a property for persisting the current
value of it's sequential counter which starts at 1.  There is no entry with value 0. </p>

<p>The properties object is another BTree contained in the master.db file.  The other
BTree is the primary one used to store Long identifiers mapped to serialized Entry bytes.
</p>

<p>There's not much to the JdbmMasterTable implementation: it's pretty straight forward.
 However future changes may change that: see below on how if you're curious.</p>

<h1><a name="MasterTable-PotentialChangesinFuture%3ANestedPartitionIdentifier"></a>Potential
Changes in Future: Nested Partition Identifier</h1>

<p><a name="MasterTable-partitionid"></a></p>

<p>In the future, the MasterTable may be altered to contain an n-bit partition identifier.
 This identifier would be combined with the 64-bit Long generated from the sequential counter.
 Instead of using the full 64-bits of the Long to generate values as the identifier, perhaps
only 48-bits will be used for the maximum capacity of the MasterTable.  The other 16-bits
can then be used for the partition identifier.  Perhaps the Long can be left bit shifted by
16, then binary OR'd with this 16-bit identifier (a Java Short).  The result would be the
identifier for entry.</p>

<div class='panelMacro'><table class='infoMacro'><colgroup><col width='24'><col></colgroup><tr><td
valign='top'><img src="/confluence/images/icons/emoticons/information.gif" width="16"
height="16" align="absmiddle" alt="" border="0"></td><td>A 16-bit (Java Short)
for the partition identifier allows for a total of 65536 different partitions to be mounted
in a server.  A 48-bit number allows for 281,474,976,710,656 (281 trillion) entries per partition.
That's a total limit of 18,446,744,073,709,551,616 entries per server which is 18.45x10^18^
which essentially is the same as the limit of a 64-bit Java Long (2^64^).</td></tr></table></div>

<h2><a name="MasterTable-Ramifications"></a>Ramifications</h2>

<p>If the server uniquely assigns this 16-bit identifier to partitions and hence their
MasterTables, then the entry identifier can now be exposed by partitions so every entry in
the server can be uniquely identified regardless of the partition used to store it.  The partitions
then are still free to use their own scheme to generate 64-bit identifiers so long as the
last 16-bits are equal to this partition id.</p>

<p>One problem with the present configuration is the lack of shared indices or the ability
to present several indices on the same attribute across partitions as one.  Effectively this
implements index partitioning.  We cannot do this today because of the potential for identifier
overlap across partitions and indices use this potentially overlapping primary identifier
to reference entries in the master table.  With server unique identifiers across partitions
this problem goes away.</p>

<p>So what's the big deal?  What does this capability give us?  It will allow for various
enhanced features never possible before.</p>

<h3><a name="MasterTable-AliasesWillWorkAcrossPartitions"></a>Aliases Will
Work Across Partitions</h3>

<p>Partitions can now expose access to a set of indices.  The indices used for alias
tracking may be accessed to allow search operations to be conducted across partitions while
allowing for alias dereferencing. </p>

<h3><a name="MasterTable-SearchCanBeExtractedAnd%2FOrDelegated"></a>Search
Can Be Extracted And/Or Delegated</h3>

<p>The search algorithm can be conducted outside of partitions, delegated to partitions,
or combined.  Effectively the search algorithm is externalized.  Even if some partitions attached
to the server are required to conduct search themselves, the overall search operation across
partitions can be centralized.  Some partitions may never need to provide search capabilities
and allow the server to handle searching via their indices.  </p>

<h3><a name="MasterTable-SharedEntryCache"></a>Shared Entry Cache</h3>

<p>When search is conducted outside of a partition a shared entry cache can be effectively
used. Modification timestamp indices exposed by partitions can be used to determine cache
evacuations.  Furthermore a shared entry cache allows the server to better manage resources.
 Also because identifiers can easily be associated with specific partitions a partition quota
can be enforced in the shared cache to achieve the same granularity of control over caching.
</p>

<h3><a name="MasterTable-Virtualization"></a>Virtualization</h3>

<p>Indices are the key to conducting joins.  A virtual index could be exposed on a virtual
attribute by a virtualization subsystem coupled with the search engine.  These virtual attributes
may be computed or extracted from external systems.  This way search filters can contain virtual
attributes which are properly evaluated.  </p>

<p>The virtual subsystem may also be involved, on entry return, to modify entries after
retrieval from a partition.  This way, the attributes of several entries, perhaps in the same
partition or across partitions may be joined to present a unified view.</p>

<p>The possibilities here are less explored however once search is decomposed, centralized,
and indices can be accessed across partitions, most limitations are in the imagination. </p>
    </div>
        <div id="commentsSection" class="wiki-content pageSection">
        <div style="float: right;">
            <a href="https://cwiki.apache.org/confluence/users/viewnotifications.action"
class="grey">Change Notification Preferences</a>
        </div>
        <a href="https://cwiki.apache.org/confluence/display/DIRxSRVx11/Master+Table">View
Online</a>
        |
        <a href="https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=80389&revisedVersion=10&originalVersion=9">View
Changes</a>
            </div>
</div>
</div>
</div>
</div>
</body>
</html>

Mime
View raw message