Hi all,

<note>
Not a dig at anyone for supporting DN presence in entries.
</note>

Similar Problem We Are Familiar With
--------------------------------------------------------------

During this ApacheCon we had discussions around the problem of embedding the DN in an entry.  Presently the DN is maintained in an entry and in the DN index.  This makes it very expensive to perform modifyDn operations.  The classic scenario of this is when a parent entry containing millions of descendants has it's DN modified.  The result is devastating:

(1) millions of entries are looked up from the master table
     (a) millions of disk blocks are accessed
     (b) millions of entries are deserialized
(2) the entry cache turns over loosing it's cache hit memory
(3) the operation can take minutes if not hours


The Problem With Subentry Changes
------------------------------------------------------------

Almost the same exact problem is causing the subentry performance issue we have been observing.  When a subentry is created, all entries selected by the subentry's subtree specification are altered to point to that subentry for it's administrative purpose.  For example, if the subentry is for access control then the accessControlSubentries multi-valued attribute has the DN of the subentry added to any entry selected by the subentry's subtreeSpecification.  This attribute is persisted in the entry within the master table.  When the subentry is removed, renamed, or it's subtree specification is altered, all the old entries selected previously are recalled from the master table to remove the old reference.  Then the new reference is added to all entries selected by the new filter.

This process is very expensive as is the modifyDN issue because of the massive number of master table read-write operations.  As with the modifyDN problem a specialized index could minimize the impact of these kinds of alterations on large sets of entries.


The Solution
--------------------

First a special [subentry ID<->entry ID] index will be maintained.  This maps the ID of a subentry to the ID of the entry the subentry selects via it's subtree specification.  Note we're not differentiating between the different kinds of subentries.  When an entry is looked up specifically asking for a subentry association attribute like accessControlSubentries then we can inject this attribute into the entry before returning it.  To compute the contents of the accessControlSubentries attribute all we have to do is walk this index in the opposite direction for the entry ID.  For each subentry all we have to do is check the objectClass index to see if it is an accessControlSubentry.  If it is then the DN of that entry is added to the accessControlSubentries attribute.  Since subentry reference attributes are all operational and normally are not returned unless explicitly asked for, we seldom need to add these attributes nor do we need to access this index.

Presently the authorization/subtree subsystem will inject these attributes into entries before pumping them into the partition.  This means that we can maintain this index by finding these subentry association attributes and using their information before deleting them.
   
Issues
----------

I don't like this solution since it puts more responsibility on the partition implementation to consistently adhere to these semantics.  It also presumes every partition will have the ability to create this index. This makes me uncomfortable.  However this would make these server faster in it's ability to respond to subentry changes.  All the changes would occur on this index instead of on the master table.

Thoughts?

Alex