On Thu, Apr 12, 2012 at 4:01 PM, Emmanue= l L=E9charny wrote:
Hi guys,
I'm currently working on the SubLevelIndex removal. This is slightly mo= re complex than the OneLevel =A0index, and there are some choice to be made= in this area.

Basically, the idea is that we will first search the entry which is the par= ent, then start fetching all the descendant from this point.

Considering we have such a tree :

cn=3DA
=A0cn=3DB
=A0cn=3DC
=A0 =A0cn=3DD
=A0 =A0 =A0cn=3DG
=A0 =A0cn=3DE
=A0cn=3DF
=A0 =A0cn=3DH

then a search for entries using a SUBTREE scope and a base DN 'cn=3DC, = cn=3DA' will return the following set of entries (the filter is objectC= lass=3D*, to keep it simple) :
{
=A0cn=3DC,cn=3DA
=A0cn=3DD,cn=3DC,cn=3DA
=A0cn=3DG,cn=3DD,cn=3DC,cn=3DA
=A0cn=3DE,cn=3DC,cn=3DA
}

Note that the same search with a ONE scope would return cn=3DD,cn=3DC,cn=3D= A and cn=3DE,cn=3DC,cn=3DA, excluding the cn=3DC,cn=3DA entry, which is the= base for this search).

Now, we have two ways to get those entries :
- depth-first traversal. We fetch every entry, and if it has some children,= then we go down one level, and so on recursively, until we have exhausted = all the descendant. The consequence is that the entries will be returned in= this order :

=A0cn=3DC,cn=3DA
=A0cn=3DD,cn=3DC,cn=3DA
=A0cn=3DG,cn=3DD,cn=3DC,cn=3DA
=A0cn=3DE,cn=3DC,cn=3DA

- Breadth-first traversal. This time, we exhaust all the children for the c= urrent level, and then we go down one level, read all the entries, etc. We = never go down one level if all the entries at the same level have not been = processed. The entries will be returned in this order :

=A0cn=3DC,cn=3DA
=A0cn=3DD,cn=3DC,cn=3DA
=A0cn=3DE,cn=3DC,cn=3DA
=A0cn=3DG,cn=3DD,cn=3DC,cn=3DA

The problem with the breadth-first approach is that it puts way more pressu= re on the memory, as we have to keep in memory all the entries that have ch= ildren. With the depth-first approach, we just proceed with a new cursor wh= en we have at least one children, so we will have as many cursors in memory= as the tree's depth (so if we have a 10 levels tree, we will keep 10 c= ursors max). OTOH, the order might be a bit strange (even if there is no gu= arantee whatsoever that the server will return the entry in any given order= ).

IMO, the depth-first approach is the one which offers the best balance betw= een performance and memory consumption. Also the return order is not that c= ritical assuming that we have implemented the Sort control (which is not ye= t part of our implementation).

Any better idea, or any comments ?

My i= nitial thought right after reading the email was DFS definitely because of = the following reasons:

(1) As you say order is not= important unless a sort control is used and that's a separate issue
(2) DFS is less memory intensive than BFS since it allows the GC to fr= ee up allocations when stack frames are popped.=A0
(3) Most DIT&#= 39;s today are rather flat so the DFS cursors will not grow that large and = the BFS approach can blow out memory when you have millions of sibling entr= ies under a container.

Then I got the idea of making this configurable so we c= an play with parameters but that might not be worth the pain. Thoughts?=A0<= /div>

--
Best Regards,
-- Alex

--f46d04426f148b437204bd7c29c3--