On Sun, Oct 9, 2011 at 7:00 PM, Emmanuel Lecharny <elecharny@apache.org> wrote:
>
>
> On Sat, Oct 8, 2011 at 2:52 PM, Stefan Seelmann <seelmann@apache.org> wrote:
>>
>> Hi Emmanuel,
>>
>> On Sat, Oct 8, 2011 at 8:27 AM, Emmanuel Lecharny <elecharny@gmail.com>
>> wrote:
>> > Hi guys,
>> >
>> > Stefan started to modify the code to get rid of the oneLevel and
>> > subLevel
>> > index, which are more or less useless as we already have the hierarchy
>> > stored into the rdn index.
>> >
>> > However, this rdn index is not good enough as is to be use as a
>> > replacement
>> > for the two other indexes. Its structure forbid us to easily retrieve
>> > the
>> > children from a known entry.
>> >
>> > The current RDN index structure is :
>> > <parentId, RDN> -> Entry ID
>> >
>> > The key is a tuple containing the parent ID to be able to rebuild the
>> > DN.
>> >
>> > The reverse index is :
>> > Entry ID -> <parentId, RDN>
>> >
>> > We don't have duplicated values.
>> >
>> > Now, when we have an entry ID, there is no simple way to get the list of
>> > all
>> > the children for this entry.
>> >
>> > We will have to add a third index to deal with such searches :
>> > ParentId -> <entryId, ....>
>> >
>> > which will list all the children of a specific entry.
>> >
>> > I'm going to investigate around this idea i the next few days.
>> >
>> > Thoughts ?
>>
>> Do we really need the additional index? My idea was to avoid this
>> additional index. Therefor I introduced a new cursor, the
>> RdnIndexTreeCursor [1] together with a RdnIndexHelper [2].
>>
>> Let's take the following example DIT:
>>
>> dc=example,dc=com (1)
>> * ou=people (3)
>> ** uid=elecharny (4)
>> ** uid=seelmann (6)
>> * ou=groups (2)
>> ** ou=directory (5)
>> ** ou=mina (7)
>>
>> The corresponding RDN index looks like this:
>>
>> <parentId, RDN> -> <entryId>
>> 0,dc=example,dc=com -> 1
>> 1,ou=groups -> 2
>> 1,ou=people -> 3
>> 2,ou=directory -> 5
>> 2,ou=mina -> 7
>> 3,uid=elecharny -> 4
>> 3,uid=seelmann -> 6
>>
>> One-level traversal works as follows: the RdnIndexTreeCursor is
>> initialzed with the parentId (say 2 for ou=groups). To be able to
>> limit the cursor's range a start element and a stop element are
>> created. Both are ParentIdAndRdn objects with the same parentId (in
>> our example 2). The start element's RDN is null. The stop elements's
>> RDN is a special RDN zzz=zzz. Those ements are used to position the
>> cursor with first()/beforeFirst()/last()/afterLast(). The next() and
>> previous() methods check that the range isn't exceeded.
>
>
> Do you mean that thos elements are created and injected in the RDN index for
> each entry having children ?
No, they are just used just used to position the cursor.
>> Sub-Level traversal is a bit more complex but still straight-forward.
>> As the search base is included in the search results the base
>> ParentIdAndRdn object is used as start and stop element (say
>> <1,ou=groups>). Depth-first traversal is used and a stack of cursors
>> is maintained. When advancing the cursor it checks if the current
>> entry has children, if that's the case a sub-cursor with a limited
>> range (like the one-level) is opened and put to the stack.
>>
>> The RdnIndexTreeCursor is used by OneLevelScopeCursor and SubScopeCursor.
>>
>> The OneLevelScopeEvaluator and SubScopeEvaluator use the
>> RdnIndexHelper, especially the methods isDirectDescendantOf(ID
>> ancestorId, ID descendantId) and isDescendantOf(ID ancestorId, ID
>> descendantId). They just do reverse lookups from the descendantId,
>> till the ancestor is found. It is possible to cache those lookups for
>> branch entries so that only one lookup is required.
>>
>> Finally the DefaultOptimizer uses the RdnIndexHelper to determine the
>> sub-level count. Therefor the ParentIdAndRdn now contains two
>> additional fields oneLevelCount and subLevelCount that are
>> incremented/decremented when adding/removing/moving entries to/from
>> the partition.
>>
>> Let's talk about the downside: This solution should work for flat
>> trees, but not for very deep trees. For very deep trees the
>> RdnIndexTreeCursor needs to maintain a large number of opened cursors.
>
>
> we can close the cursors when they are not used anyƶore. Doing a width first
> search, we will only create as many cursor as we need for one single level.
> This should not be overkilling.
>
>>
>> Also the caching of the isDescendantOf information would require more
>> memory for deep trees.
>>
>> Do you think that is a feasible approach?
>
>
> it should work...
>>
>> I try to update the branch tomorrow, but can't promise anything.
>
>
> ok, many thanks ! I tested the branch, there are some failing tests in
> core-integ, but had little time to investigate.
Yes, some tests are still failing. For example I observed some kind of
deadlock when recursively deleting entries in a deep nested tree.
But unfortunately I didn't manage to continue with the branch. I tried
to merge the recent changes from trunk but come conflicts occured. I
think it doesn't make sense to continue for me because due the the
high trunk activity it takes too much time to merge in the trunk
changes.
Kind Regards,
Stefan
|