From commits-return-25791-apmail-directory-commits-archive=directory.apache.org@directory.apache.org Thu May 13 22:58:22 2010 Return-Path: Delivered-To: apmail-directory-commits-archive@www.apache.org Received: (qmail 56127 invoked from network); 13 May 2010 22:58:22 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 13 May 2010 22:58:22 -0000 Received: (qmail 34322 invoked by uid 500); 13 May 2010 22:58:22 -0000 Delivered-To: apmail-directory-commits-archive@directory.apache.org Received: (qmail 34276 invoked by uid 500); 13 May 2010 22:58:22 -0000 Mailing-List: contact commits-help@directory.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@directory.apache.org Delivered-To: mailing list commits@directory.apache.org Received: (qmail 34269 invoked by uid 99); 13 May 2010 22:58:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 May 2010 22:58:22 +0000 X-ASF-Spam-Status: No, hits=-1353.1 required=10.0 tests=ALL_TRUSTED,AWL,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE,MIME_HTML_ONLY X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 May 2010 22:58:20 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o4DMw07V010136 for ; Thu, 13 May 2010 22:58:00 GMT Date: Thu, 13 May 2010 18:58:00 -0400 (EDT) From: confluence@apache.org To: commits@directory.apache.org Message-ID: <23071212.201.1273791480189.JavaMail.confluence@thor> Subject: [CONF] Apache Directory Server v1.5 > Master Table MIME-Version: 1.0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Auto-Submitted: auto-generated

Master Table

Page edited by Emmanuel L=C3=A9charny

Comment: Fixed some info about MasterTable

Changes (9)

=20 =20
=
...
Somewhere in a partition base= d on two column data structures (i.e. BTree), we need to store entry object= s. This table will contain the entry objects with attributes and values as= the user provided them during add and modify operations. The server may i= nsert additional attributes such as operational attributes into these entri= es, however all the original attributes provided by the user are as they we= re including the case of their attribute identifiers.

Such a table stores entri= es with the Tuple key set to some unique identifier and the value set to a = serialized representation of the entry. The identifier must be unique, right now it's a long and will most likely be is sequential.

With the new ServerEntry API= , entries serialized into such a table will now contain their DN even if th= e DN is not exposed or stored as an attribute (like ou and objectClass) in = the serialized entry. This means such a table is all that is required to re= construct the entire partition and represents a master copy of the informat= ion regarding an entry. The rest of the information in the partition can b= e derived from this master copy.
With the new Entry API, entries serialized into such a table don'= t contain their DN in the serialized entry. This means we have to rebuild i= t using the RDN index in order to return the full Entry. The rest of the in= formation in the partition can be derived from this master copy.

Hence the MasterTable in= terface extends the Table interface to lookup entries by numeric identifier= s. Tuple keys are Long identifiers, and Tuple values are serialized ServerE= ntry objects.

The MasterTable also contain= s a means to store simple key value pair properties associated with the par= tition. These many be used to store configuration settings or other bits o= f information. The MasterTable also exposes access to a persistent sequenc= e counter used to generate new identifiers when adding new entries to the t= able.

{note}
At any point we may decide t= o expose the user provided DN embedded in serialized entries as an operatio= nal attribute for easy DN recovery specifically to help some client's c= ope with rather painful API's.
Using Long as identifier has the advantage of being simple, but the p= roblem is that we have to deal with the sequential aspect : we need to stor= e each new created number on the disk, which leads to an extra disk access = when creating an entry. Also this unique number will be local. We think tha= t using the entry UUID instead would be a better idea.
{note}

The MasterTable also contains a means to store simple key value pair = properties associated with the partition. These many be used to store conf= iguration settings or other bits of information.

The MasterTable= also exposes access to a persistent sequence counter used to generate new = identifiers when adding new entries to the table.


h1. JDBM MasterTable Implemen= tation

The JDBM based MasterTabl= e implementation, JdbmMasterTable, simply extends a JdbmTable and locks the= Tuple keys to use Long objects. A custom Serializer is used to marshal ServerEntry objects to and from this Table. The implementation uses a = property for persisting the current value of it's sequencial counter wh= ich starts at 1. There is no entry with value 0.

The properties object is = another BTree contained in the master.db file. The other BTree is the prim= ary one used to store Long identifiers mapped to serialized ServerEntry bytes.

There's not much to = the JdbmMasterTable implementation: it's pretty straight forward. Howe= ver future changes may change that: see below on how if you're curious.=
...

Full Content

MasterTable In= terface

Somewhere in a partition based on two column data structures (i.e. BTree= ), we need to store entry objects. This table will contain the entry objec= ts with attributes and values as the user provided them during add and modi= fy operations. The server may insert additional attributes such as operati= onal attributes into these entries, however all the original attributes pro= vided by the user are as they were including the case of their attribute id= entifiers.

Such a table stores entries with the Tuple key set to some unique identi= fier and the value set to a serialized representation of the entry. The id= entifier must be unique, right now it's a long and is sequential.

With the new Entry API, entries serialized into such a table don't conta= in their DN in the serialized entry. This means we have to rebuild it using= the RDN index in order to return the full Entry. The rest of the informati= on in the partition can be derived from this master copy.

Hence the MasterTable interface extends the Table interface to lookup en= tries by numeric identifiers. Tuple keys are Long identifiers, and Tuple va= lues are serialized ServerEntry objects.

3D""Using Long as identifier has the advanta= ge of being simple, but the problem is that we have to deal with the sequen= tial aspect : we need to store each new created number on the disk, which l= eads to an extra disk access when creating an entry. Also this unique numbe= r will be local. We think that using the entry UUID instead would be a bett= er idea.

The MasterTable also contains a means to store simple key value pair pro= perties associated with the partition. These many be used to store configu= ration settings or other bits of information.

The MasterTable also exposes access to a persistent sequence counter use= d to generate new identifiers when adding new entries to the table.

JDBM MasterTa= ble Implementation

The JDBM based MasterTable implementation, JdbmMasterTable, simply exten= ds a JdbmTable and locks the Tuple keys to use Long objects. A custom Seri= alizer is used to marshal Entry objects to and from this Table. The implem= entation uses a property for persisting the current value of it's sequencia= l counter which starts at 1. There is no entry with value 0.

The properties object is another BTree contained in the master.db file. = The other BTree is the primary one used to store Long identifiers mapped t= o serialized Entry bytes.

There's not much to the JdbmMasterTable implementation: it's pretty stra= ight forward. However future changes may change that: see below on how if = you're curious.

Potential Changes in Future: Nested Partition Identifier

In the future, the MasterTable may be altered to contain an n-bit partit= ion identifier. This identifier would be combined with the 64-bit Long gen= erated from the sequential counter. Instead of using the full 64-bits of t= he Long to generate values as the identifier, perhaps only 48-bits will be = used for the maximum capacity of the MasterTable. The other 16-bits can th= en be used for the partition identifier. Perhaps the Long can be left bit = shifted by 16, then binary OR'd with this 16-bit identifier (a Java Short).= The result would be the identifier for entry.

3D""A 16-bit (Java Short) for the partit= ion identifier allows for a total of 65536 different partitions to be mount= ed in a server. A 48-bit number allows for 281,474,976,710,656 (281 trilli= on) entries per partition. That's a total limit of 18,446,744,073,709,551,6= 16 entries per server which is 18.45x10^18^ which essentially is the same a= s the limit of a 64-bit Java Long (2^64^).

Ramifications

If the server uniquely assigns this 16-bit identifier to partitions and = hence their MasterTables, then the entry identifier can now be exposed by p= artitions so every entry in the server can be uniquely identified regardles= s of the partition used to store it. The partitions then are still free to= use their own scheme to generate 64-bit identifiers so long as the last 16= -bits are equal to this partition id.

One problem with the present configuration is the lack of shared indices= or the ability to present several indices on the same attribute across par= titions as one. Effectively this implements index partitioning. We cannot= do this today because of the potential for identifier overlap across parti= tions and indices use this potentially overlapping primary identifier to re= ference entries in the master table. With server unique identifiers across= partitions this problem goes away.

So what's the big deal? What does this capability give us? It will all= ow for various enhanced features never possible before.

Aliases Wil= l Work Across Partitions

Partitions can now expose access to a set of indices. The indices used = for alias tracking may be accessed to allow search operations to be conduct= ed across partitions while allowing for alias dereferencing.

Searc= h Can Be Extracted And/Or Delegated

The search algorithm can be conducted outside of partitions, delegated t= o partitions, or combined. Effectively the search algorithm is externalize= d. Even if some partitions attached to the server are required to conduct = search themselves, the overall search operation across partitions can be ce= ntralized. Some partitions may never need to provide search capabilities a= nd allow the server to handle searching via their indices.

Shared Entry Cache

When search is conducted outside of a partition a shared entry cache can= be effectively used. Modification timestamp indices exposed by partitions = can be used to determine cache evacuations. Furthermore a shared entry cac= he allows the server to better manage resources. Also because identifiers = can easily be associated with specific partitions a partition quota can be = enforced in the shared cache to achieve the same granularity of control ove= r caching.

Virtualization

Indices are the key to conducting joins. A virtual index could be expos= ed on a virtual attribute by a virtualization subsystem coupled with the se= arch engine. These virtual attributes may be computed or extracted from ex= ternal systems. This way search filters can contain virtual attributes whi= ch are properly evaluated.

The virtual subsystem may also be involved, on entry return, to modify e= ntries after retrieval from a partition. This way, the attributes of sever= al entries, perhaps in the same partition or across partitions may be joine= d to present a unified view.

The possibilities here are less explored however once search is decompos= ed, centralized, and indices can be accessed across partitions, most limita= tions are in the imagination.