Creating a separate thread on the primary topic of virtualization.

To achieve virtualization I have a specific idea in mind although this will most likely not be implemented before 2.0.


(1) Server Wide Entry ID (not UUID)

First I'd like to see the server creating IDs for partitions, while exposing access to this ID.  Right now entry ids are specific to a partition implementation and are not exposed to higher levels.  I would like to see these IDs exposed to be able to leverage them within the server above partitions.  Note there's a confluence page about making server wide composite IDs which use some bits to associate the entry with it's partition.

Even if the partition is not necessarily Index based as is JDBM and LDIF will be, it still can take a server provide ID for entries it creates.  The partition is still in charge of what is created as long as it uses the provided identifier to associate with the entry.  This will allow the virtual subsystem to build indices itself if it's needed, both for caching and precomputing virtual values.

This change will also enable other kinds of features such as:

   (a) partition nesting
   (b) hashed entry partitioning in a single parent where multiple partitions can be used to contain for example 500 Million user entries under a single parent
   (c) A root (default) partition to store things like the RootDSE and DIT wide subentries


(2) Schema Extension for Virtual Attributes

A new schema extension will be created to be able to mark attributeTypes as virtual.  This marker simply allows the search engine to know that it must consult the virtual subsystem when conducting searches with filters contain virtual attributes.  VIRTUAL might be best.


(3) Virtual Subsystem Interface

An interface is needed to ask specific questions about sets of entries or specific entry candidates using IDs.  This interface kind of resembles an Index in the BTree based partitions.  You can ask for the value of a virtual attribute for an entry and get the result which is similar to an index lookup.  Behind the scenes the virtual subsystem may actually compute the value, lookup cached values, or read the value from disk or access some external store.  In this process, the partition containing the entry may be accessed to acquire more information.  Since this will most likely be built into the Xdbm search engine all partitions involved will most likely have indices and will be able to expose them.

In addition to lookup requests for specific entries, Cursors can be acquired to access sets of entries satisfying some assertion on the virtual attribute.  This virtual Cursor can be used by the search engine to build a Cursor system encorporating the virtual assertions into the search result.  Of course since virtualization can be expensive (computing or going over the network), these assertions on virtual attributes will have the lowest priority.  The idea is to constrain the amount of computation we need to do by restricting the search space to as small as possible.  The presence of the schema extension to designate attributeTypes as virtual will allow us to do this.

----------------

With this configuration, the search engine will compose a system of cursors built to reflect the search filter while consulting the virtual subsystem to perform lookups and request virtual cursors.  Once incorporated into the system of cursors, the product is bubbled back up the system to be used as before.  Other layers of the server are not impacted but now need not worry about inject virtual attributes any longer (i.e. the collective system manually injects today).

I know this might not be perfect and several tweaks and optimizations will be required.  However I think this is a design we can work on to achieve a solid means to deal with virtual attributes in search.  This does not however explain how we are going to allow for user specified virtualization, or manage it in general.  This is just a first step to get lookups and search on virtual attributeTypes working.

Comments?

Alex