directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Seelmann <>
Subject Re: RDN index, oneLevel and sublevel index merge
Date Sat, 08 Oct 2011 12:52:19 GMT
Hi Emmanuel,

On Sat, Oct 8, 2011 at 8:27 AM, Emmanuel Lecharny <> wrote:
> Hi guys,
> Stefan started to modify the code to get rid of the oneLevel and subLevel
> index, which are more or less useless as we already have the hierarchy
> stored into the rdn index.
> However, this rdn index is not good enough as is to be use as a replacement
> for the two other indexes. Its structure forbid us to easily retrieve the
> children from a known entry.
> The current RDN index structure is :
> <parentId, RDN> -> Entry ID
> The key is a tuple containing the parent ID to be able to rebuild the DN.
> The reverse index is :
> Entry ID -> <parentId, RDN>
> We don't have duplicated values.
> Now, when we have an entry ID, there is no simple way to get the list of all
> the children for this entry.
> We will have to add a third index to deal with such searches :
> ParentId -> <entryId, ....>
> which will list all the children of a specific entry.
> I'm going to investigate around this idea i the next few days.
> Thoughts ?

Do we really need the additional index? My idea was to avoid this
additional index. Therefor I introduced a new cursor, the
RdnIndexTreeCursor [1] together with a RdnIndexHelper [2].

Let's take the following example DIT:

dc=example,dc=com (1)
* ou=people (3)
** uid=elecharny (4)
** uid=seelmann (6)
* ou=groups (2)
** ou=directory (5)
** ou=mina (7)

The corresponding RDN index looks like this:

<parentId, RDN> -> <entryId>
0,dc=example,dc=com -> 1
1,ou=groups -> 2
1,ou=people -> 3
2,ou=directory -> 5
2,ou=mina -> 7
3,uid=elecharny -> 4
3,uid=seelmann -> 6

One-level traversal works as follows: the RdnIndexTreeCursor is
initialzed with the parentId (say 2 for ou=groups). To be able to
limit the cursor's range a start element and a stop element are
created. Both are ParentIdAndRdn objects with the same parentId (in
our example 2). The start element's RDN is null. The stop elements's
RDN is a special RDN zzz=zzz. Those ements are used to position the
cursor with first()/beforeFirst()/last()/afterLast(). The next() and
previous() methods check that the range isn't exceeded.

Sub-Level traversal is a bit more complex but still straight-forward.
As the search base is included in the search results the base
ParentIdAndRdn object is used as start and stop element (say
<1,ou=groups>). Depth-first traversal is used and a stack of cursors
is maintained. When advancing the cursor it checks if the current
entry has children, if that's the case a sub-cursor with a limited
range (like the one-level) is opened and put to the stack.

The RdnIndexTreeCursor is used by OneLevelScopeCursor and SubScopeCursor.

The OneLevelScopeEvaluator and SubScopeEvaluator use the
RdnIndexHelper, especially the methods isDirectDescendantOf(ID
ancestorId, ID descendantId) and isDescendantOf(ID ancestorId, ID
descendantId). They just do reverse lookups from the descendantId,
till the ancestor is found. It is possible to cache those lookups for
branch entries so that only one lookup is required.

Finally the DefaultOptimizer uses the RdnIndexHelper to determine the
sub-level count. Therefor the ParentIdAndRdn now contains two
additional fields oneLevelCount and subLevelCount that are
incremented/decremented when adding/removing/moving entries to/from
the partition.

Let's talk about the downside: This solution should work for flat
trees, but not for very deep trees. For very deep trees the
RdnIndexTreeCursor needs to maintain a large number of opened cursors.
Also the caching of the isDescendantOf information would require more
memory for deep trees.

Do you think that is a feasible approach?

I try to update the branch tomorrow, but can't promise anything.

Kind Regards,


View raw message