From commits-return-33863-apmail-directory-commits-archive=directory.apache.org@directory.apache.org Mon Mar 26 15:14:28 2012 Return-Path: X-Original-To: apmail-directory-commits-archive@www.apache.org Delivered-To: apmail-directory-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8823B9B17 for ; Mon, 26 Mar 2012 15:14:28 +0000 (UTC) Received: (qmail 49620 invoked by uid 500); 26 Mar 2012 15:14:28 -0000 Delivered-To: apmail-directory-commits-archive@directory.apache.org Received: (qmail 49584 invoked by uid 500); 26 Mar 2012 15:14:28 -0000 Mailing-List: contact commits-help@directory.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@directory.apache.org Delivered-To: mailing list commits@directory.apache.org Received: (qmail 49577 invoked by uid 99); 26 Mar 2012 15:14:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Mar 2012 15:14:28 +0000 X-ASF-Spam-Status: No, hits=-1994.3 required=5.0 tests=ALL_TRUSTED,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE,MIME_HTML_ONLY X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Mar 2012 15:14:22 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id q2QFE0Tn018460 for ; Mon, 26 Mar 2012 15:14:00 GMT Date: Mon, 26 Mar 2012 11:14:00 -0400 (EDT) From: confluence@apache.org To: commits@directory.apache.org Message-ID: <12812869.52984.1332774840028.JavaMail.confluence@thor> Subject: [CONF] Apache Directory Server v1.5 > Structure and Organization MIME-Version: 1.0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Auto-Submitted: auto-generated

Structure and Organization

Page edited by Emmanuel L=C3=A9charny


Changes (1)

=20 =20
=20 =20
...
h4. OneLevel index removal
We don't need the One= Level index at all, as we can use the Rdn index to get the very same result= : the list of all the children for a specific entry. The Rdn index can be = browsed to get all the ID for an entry like <parentID, *>. For instan= ce, if we are looking for all the entries under the *ou=3DSales* entry, we = will first get this entry ID (ie, 2), and fetch for all the elements where = the key is <2, *>. That will give us back \= {cn=3DJOnny WAlkeR, cn=3DJIM BEAN= \}.

At the end of the day, i= t's just a matter of being able to process the <parentID, *> spec= ial element.
...

Full Content

Introduct= ion

We already covered some of the basic structural aspects in a 2 column Ta= ble based partition design in the other sections. It should be clear how in= dices store Long pointers into the MasterTable to access entries with speci= fic attribute values. Indices prevent the need for full MasterTable scans = which would entail a massive volume of expensive lookup, deserialization, n= ormalization and comparison operations.

This document will cover higher level details such as the default system= indices used for common operations and how these indices along with option= al user indices work together to efficiently search Table based partitions = using LDAP filters. Namespace and hierarchy maintenance aspects are also co= vered.

System Indices

Legend of Object Identifiers

OID Aliases
2.5.4.3 cn,commonName
2.5.4.11 ou,organizationalUnitName

System indices are required indices. Table based partitions need these = indices to properly manage entries in the LDAP namespace, and the entry hie= rarchy. These indices are the minimum indices required to properly conduct= search operations on the part of the directory information base stored in = the partition. Just for easy referral we show the example tree here again = for index content examples.

ParentIdAndRdn (called Rdn) Index

This index associates the RDN of each entry with it's entry ID in the as= ter table. It also stores the hierarchical structure of the whole DIT. In o= rder to do that, we keep a track to the parent's ID in the key.

The key is then a composition of the entry's RDN and its parent's ID.

We have a specific case : a context entry (which has a naming context in= the rootDSE) is using the full DN and a special marker as the parent ID (n= ote that as we don't know the type of the ID, we will have a specific value= depending on the iD type : 0 if the ID is a long, a special UUID if the ID= is an UUID, etc)

We don't necessarily store only a RDN in this data structure : when we a= re dealing with a naming context, which may have a DN with more than one RD= N, we treat this DN as if it was a unique RDN.

For instance, if 'dc=3Dexample, dc=3Dorg' is a naming context, then its = key will be <'dc=3Dexample, dc=3Dorg', 0>.

On= e Level (onelevel) Index

The one level index maps parent entry identifiers from the master to the= identifiers of their children. This index is used to facilitate various na= me space operations like renaming, and moving branches which impacts childr= en. It is also used by the search engine to conduct one level search reque= sts.

Su= b Level (sublevel) Index

This index maps ancestor entry identifiers from the master to the identi= fiers of all their descendants including immediate children. It does not m= ap the descendants of the context entry (at the root) of the partition. Th= is is just unnecessary since all other entries in the partition satisfy thi= s condition. If this is something we desire to enumerate we can get a reve= rse Cursor on the ndn index and advance past the first entry, to start enum= erating all the descendant identifiers of the context entry. Note that thi= s index contains the <id,id> tuple in order to include the added entr= y itself to the SUBTREE scope search.

This index plays a critical role in subtree search. It allows the searc= h optimizer to detect a count and annotate the modified search filter for s= ubtree scope nodes. It also allows a Cursor to enumerate the set of descen= dants associated with an entry. Without this index, the search engine must= resort to expensive DN based operations for subtree scope constraint check= ing.

Other Untapped Advantages
Besides= helping with search, the forward LUT of the sublevel index can be used to = turn recursive methods for rename and move operations into iterative operat= ions.

Note that the reverse LUT of this index maps entries to all their ancest= ors minus the context entry. This can be used to quickly walk the lineage o= f an entry in the tree. This may be useful for certain operations.

3D""Digram needs to be updated to reflect= the changes done in StoreUtils.java

OneLevel/SubLevel index removal

We have created those two indexes back in a time we were using UpDN and = NormDn table, indead of RDN table. Now that we are using the RDN index, we = don't need those two index.

Let's see with an exemple how we can get rid of those two indexes.

OneLevel = index removal

We don't need the OneLevel index at all, as we can use the Rdn index to = get the very same result : the list of all the children for a specific entr= y. The Rdn index can be browsed to get all the ID for an entry like <par= entID, >. For instance, if we are looking for all the entries under t= he *ou=3DSales entry, we will first get this entry ID (ie, 2), and fetc= h for all the elements where the key is <2, *>. That will give us bac= k {cn=3DJOnny WAlkeR, cn=3DJIM BEAN}.

At the end of the day, it's just a matter of being able to process the &= lt;parentID, *> special element.

SubLevel remov= al

We can also get rid of this index, and use the ParentIdAndRdn index inst= ead. The algorithm is a bit more complex that the one we use for the OneLev= el index, as we will have to recurse on all the entries having at least one= child.

In order to ease this computation, we will add two values in the ParentI= dAndRdn data structure :

    =09
  • the nbChildren count
  • =09
  • the nbDescendants count.

The fist one counts the number of direct children on each entry (0 means= the entry does not have children)
The second one counts the number of its descendant (its children, the child= ren of its children, ...)

Those two values are updated whenever the DiT is modified (addition, rem= oval or move).

Exis= tence/Presence Index

The existence index maps the OID of only indexed attributeTypes, to the = identifiers in the master for entries containing one or more values for the= attribute. All attributeTypes are not tracked. When an index is created = on ou it will contain IndexEntry objects for 'ou' in this existence index. = Note the OID for attributeTypes is the normalized form for attribute type = identifiers.

Remember for our example DIT's partition we i= ndexed objectClass, ou, and cn.

The existence index is used to conduct searches using the existence oper= ator =3D*. For example a search with the following filter, (cn=3D= 2;), returns entries 5, 6, and 7. The search engine acquires a Cursor = and positions it just before the key '2.5.4.11' which is the unambiguous re= presentation of 'cn' or 'commonName'. It then walks (advances one step at = a time) the IndexEntry objects for this key, and stops the walk as soon as = this key's values are exhausted. At each advance, the entry identifier is = used to lookup the entry in the master table and return it.

Notice the existence index does not add the OID for the objectClass attr= ibute. This is specifically avoided since all entries contain the objectCl= ass attribute. There is no reason for this index to bloat by including it.=

3D""I 'think' the objectClass attr= ibute is not considered for existence indices however this may not be the c= ase. We need to determine this and make sure we do NOT add objectCl= ass IndexEntrys to the existence index.

Alias Indices
Unknown macro: {warn}=20

Those index might be removed.

Alias Sche= ma Elements

Alias Impa= ct on Search

Alias entries cause cycles in a DIT partition which would otherwise be a= perfect tree with the context entry at the root. LDAP search operations ar= e conducted using one of four alias dereferencing modes:

    =09
  1. ignore aliases while searching, returning alias entries as part of t= he results
  2. =09
  3. dereference aliases while trying to find the search base
  4. =09
  5. dereference aliases only while searching (search propagates and cont= inues at the entry referenced)
  6. =09
  7. dereference aliases while trying to find the search base, and while = searching

Aliases complicate the mechanics of conducting search operations. Searc= hes, without alias dereferencing enabled, contain a clear candidate range f= or which filter evaluation is applied. Without dereferencing the candidate= range is based on the search scope parameter and the search base. Alias d= ereferencing expands the candidate range into areas of the DIT that would n= ot have been normally eligible for filter evaluation. Candidate determinat= ion with dereferencing must factor in the presence of aliases.

Three specialized system indices are used to manage the 3 modes above wh= ich allow for some kind of alias dereferencing. These system indices are c= alled the alias, oneAlias, and subAlias indices. We describe what each ind= ex contains and how it is used in the subsections below.

Aliases are entries of the alias objectClass which reference other entri= es in the DIT. Aliases are not allowed to have subordinate (child) entries.= To the left you'll see the schema ER diagram for the alias objectClass an= d it's mandatory aliasedObjectName attributeType.

Example DIT Reloaded (with Aliases)

But first we should add some alias entries to our example DIT to show wh= at the composition of these alias indices would be. On the right is our ex= ample DIT with some new alias entries in red. Two new alias entries have b= een added:

Id Alias DN Ref. Id Referenced Entry DN
8 6 commonName=3DJim Bean,ou=3DBoard of Directors,o= =3DGood Times Co. cn=3DJIM BEAN,ou=3DSales,o=3DGood Times Co.<= /td>
9 5 2.5.4.3=3DJohnny Walker,ou=3DEngineering,o=3DGoo= d Times Co. cn=3DJOhnny WAlkeR,ou=3DSales,o=3DGood Times Co.=

Follow the link on the image to the right or click here to view the LDIF= for these two newly added alias entries. These LDIF entries have the exte= nsibleObject objectClass set for a value of their objectClass attribute. T= his marker is a cue to allow these entries to contain values for any define= d attribute in the server's schema. It's what allows us to use a commonNam= e attribute for the relative distinguished names of these entries. Also we= mix things up here to make a point regarding attribute name variance: the = attribute identifier for commonName can be 'cn', 'commonName', or even the = OID for this attributeType which is 2.5.4.3.

3D""Digram needs to be updated to reflect= the changes done in StoreUtils.java

Alias In= dex (alias)

=

The 'alias' index is just like a user index on the aliasedObjectN= ame attribute. The normalized distinguished name of the aliased object (th= e value of aliasedObjectName attribute) is stored in Tuple keys of the forw= ard LUT. The Long identifier of the alias entry is stored in the Tuple val= ue. The alias index is not essential, however it does improve the performa= nce of some operations as an optimization.

To the right, you'll see the composition of the forward LUT for this ind= ex. It's straight forward: value of aliasedObjectName normalized maps to t= he id of the alias entry containing this value.

<= /a>One Level Alias Index (oneAlias)

The one level alias index assists in candidacy determination when one le= vel search is enabled with dereferencing while searching. This index conta= ins Long identifiers for both the key and value fields of it's Tuples. For= the forward LUT, the Tuple keys represent non-leaf entries with alias chil= d entries NOT pointing to their siblings. Child alias entry identif= iers are stored in the value of the corresponding Tuple. Child aliases poi= nting to entries that are siblings do not alter the candidate range. Only = those child aliases that point to non-sibling entries expand the candidate = range by bringing the referred to entries into scope.

To the right, you'll see the composition of the forward LUT for this ind= ex. The first Tuple maps id 3 to id 8. This is because the alias entry 8,= expands the scope of one level search when conducted with search base set = to entry 3. Entry 8 brings it's referenced entry, entry with id 6, into th= e scope of the search. Notice the alias entry, the entry with id 8, does n= ot point to a sibling so it is included. The same reasoning holds for the = second Tuple in this index's forward LUT.

3D""Digram needs to be updated with the c= orrect key-value ids

<= /a>Sub Level Alias Index (subAlias)

The subtree level alias index assists in candidacy determination when su= btree level search is enabled with dereferencing while searching. This ind= ex also contains Long identifiers for both the key and value fields of it's= Tuples. For the forward LUT, the Tuple keys represent non-leaf entries wi= th alias descendants referencing entries that are NOT descendants of= the key. Descendant alias entry identifiers are stored in the value of th= e corresponding Tuple. Descendant aliases pointing to entries that are des= cendants of the key do not alter the candidate range. Only those descendan= t aliases that point to non-descendant entries expand the candidate range b= y bringing the subtree of nodes rooted at the referred to entries into scop= e. Another way to think about this index is that it tracks all the ancesto= rs of alias entries.

To the right, you'll see the composition of the forward LUT for this ind= ex. The first Tuple maps id 3 to id 8. This is because the alias entry 8,= expands the scope of subtree level search when conducted with search base = set to entry 3. Entry 8 brings it's referenced entry, entry with id 6, int= o the scope of the search. Notice the alias entry, the entry with id 8, do= es not point to a descendant of the search base, entry with id 3, so it is = included. The same reasoning holds for the second Tuple in this index's fo= rward LUT.

Notice though we do not have Tuples for (1, 8) and (1,9). Although alia= s entries 8 and 9 are descendants of entry 1, they do not expand the search= scope with subtree search operations using entry 1 as the base. All these= aliases and the entries they point to were reachable as candidates before = creating alias entries 8 and 9. Only alias entries pointing out of the sub= tree of the base tow entries that are not search base descendants are track= ed by this index.

3D""Digram needs to be updated with the c= orrect key-value ids

Handling Dereferencing with Object Level Scope

Object level search requests are trivially handled. Only two of the dere= ferencing modes matter to us in this case.

Mode Considered Reason
neverDerefAliases no this is a normal search and dereferencing does n= ot matter
derefFindingBase yes if the base is an alias we must dereference it b= efore filter evaluation
derefSearching no with object level search we're not searching
derefAlways yes same as derefFindingBase

So if alias dereferencing is in effect always or when finding the base a= check must be performed to see if the search base entry is an alias. If i= t is an alias, the referenced entry is resolved, and the search base is set= to the referenced entry. In this simple case with object level scope, ali= as dereferencing does not expand the single candidate. Instead alias deref= erencing switches the base to the referenced entry.

Handling Dereferencing with One Level Scope

One level search requests apply the same technique of switching the base= when base dereferencing is enabled. Actually all search scopes switch the= search base when base dereferencing is enabled, and the original search ba= se is in fact an alias.

The search engine uses this index to find aliases which expand the range= of candidates to entries outside the scope. After the base entry is resol= ved, the search engine, uses the identifier of the base to lookup all alias= children. The referred to entries are then included in search filter eval= uation and returned if accepted by the filter. This way the candidate rang= e expands to include referenced entries in the aliasedObjectName attribute.=

3D""When alias dereferencing while search= ing is enabled, ScopeNodes, in the filter AST, should not be used as candid= ate enumerators. In fact the search engine's optimizer should annotate the= scope node with a maximum count value when dereferencing while searching i= s enabled. I do not know if this is the case today.
=20

For our example, let's do a search where the search base is entry with i= d 4. The scope is one level and derefSearching is enabled. The search eng= ine positions a Cursor over the children of entry 4 using the onelevel hier= archy index. It joins the results of this Cursor with the results from ano= ther Cursor on the oneAlias index over forward LUT key 4. This includes en= tries 6, 7 and 9 as candidates. Entry 9 however is filtered out of the res= ult set when derefSearching is enabled. It is not returned since it is an = alias.

Handling Dereferencing with Subtree Level Scope

Dereferencing while searching with subtree scope is rather painful. Aft= er dereferencing the base, if needed and derefFindingBase is enabled, the s= earch engine looks up all alias descendants in the subAlias index. Alias d= escendants are then collected, and put into a special ScopeAssertion which = uses this list and search scope parameters to determine if an entry is an e= ligible candidate for filter evaluation.

The semantics of search with derefSearching enabled presumes search prop= agation into dereferenced subtrees. This means a propagated search may enc= ounter new alias entries which it must dereference as well.

You'll see in more advanced sections of this guide that assertion and ca= ndidate enumerating Cursors are used by the search engine. One assertion is= the scope assertion. For search without alias dereferencing while searchi= ng, this scope assertion just makes sure the candidate entries are in scope= by checking if they are subordinate to (with on level search) or descendan= ts of the search base (with subtree level search).

When derefSearching is enabled, the search engine modifies how this scop= e assertion determines if a candidate is within scope. The algorithm is si= mple and it uses the subAlias index with subtree scope search:

    =09
  1. Use the provided search base parameter to find all alias entries tha= t are descendants of the base and point to entries outside the scope to ent= ries that are not descendants of the search base. To do so we get a Cursor = on the forward LUT of the subAlias index over Tuples with the key of the se= arch base.
  2. =09
  3. For each value of the Tuples returned, repeat the process recursivel= y by presuming the alias entry is now the search base. Stop recursing when= no scope expanding alias descendants are found. Do this while collecting = the complete set of aliases found.
  4. =09
  5. Use the set of aliases and the search base in the scope assertion: t= he assertion returns true if any candidate entry is a descendant of any of = these aliases or the search base. Otherwise it returns false.

By modifying the behavior of the scope assertion to factor in scope expa= nding aliases in the scope constraint check we effectively appear to propag= ate search through aliases.

Optimizations
Sometimes aliases w= ill be return in this set of aliases to consider in scope assertions, which= are descendants of other aliases already in this set. With subtree search= ing the search will automatically hit these descendant aliases so there's n= o need to have them in the set. Hence we can prune this set, or perform ch= ecks before insertion to prevent including redundant aliases in this set.
=20

We will cover more of this while looking closer at the search engine in = different sections of this documentation.

Alias Limitat= ions

ApacheDS places some limitations on Aliases. The limitations below are = imposed either for the sake of simplicity in conducting search or due to st= ructural limitations.

    =09
  1. Alias chaining where one alias refers to another alias is not allowe= d.
  2. =09
  3. Aliases can only reference entries within the same Partition. This = limitation is not by choice and due to the inability to share access to ind= ices across partitions which may change in the future.

objectClass I= ndex

For the time being, the objectClass index is considered a system index. = This is purposefully done since objectClass values are critical for server = operation and used heavily. Without setting this index performance would b= e terrible.

User Indices

The partition can be asked to maintain an index on any attribute defined= in the servers schema. These indices contain Tuples in their forward LUT,= mapping a value of the attribute, to the id of the entry with that attribu= te value. This way after consulting the index, entries with specific value= s for this attribute can be quickly looked up by identifier in the master t= able.

Other Usefu= l Indices

The function of several optional ApacheDS features may be greatly enhanc= ed by the presence of indices on certain attributes. One example is the re= vision attribute which has not yet been defined, but it will. This attribu= te will be used by the change log facility to track the last altering revis= ion on an entry. When this system and it's related snapshoting capabilitie= s are enabled, having an index on this attribute will be very valuable.

Other similar examples exist. You'll find for different usecases, and f= or different kinds of canned search distributes, indices on some attributes= will greatly improve search performance. More about this is covered on th= e search engine section of this guide.