directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lecharny <>
Subject Update about subtree problems
Date Mon, 19 Jul 2010 23:34:47 GMT
  Hi guys,

just an update about the current situation.

We discussed on this ML about options regarding how to handle subtrees 
in the most efficient way in the server. We currently have 3 possibilities :

- the first one, which is partially implemented in trunk, consist on 
static references to subentries in all the entries selected by the 
subtreeSpecifications defined in each subentry.

- the second one suppose that we compute when needed if an entry is a 
selected entry, each time we access this entry

- the third solution, which has not been discussed already, would be to 
have an index which stores the list of all the selected entries 
associated with each subentry.

Let's first think about the third solution (this is the one implemented 
by OpenDS, btw). It's a costly one when you create a new subentry, or 
when you move one, as you have to update the index with all the impacted 
entries. Plus it also has a cost as you have to hit the index to get the 
list of subentries an entry is part of. Of course, we can use a cache, 
but I'm not sure that it will save a lot of time. In fact, AFAICT, you 
pay the price twice : when creating the subentry, and each time you 
access any entry.

Ok, so option 3 does not seems to be reasonable. To be frank, we 
discussed about t with Pierre-Arnaud but not really considered this as a 
decent solution. We may be wrong though.

Regarding option 1 and 2, none is perfect.

Let's see the pros and cons for each of them

Solution 1 (update when manipulating the subentry)
pros :
- *very* efficient operations on normal entries, as they already have a 
reference to all the subentries that select them.
- no extra cost when doing a rename, a modify, an add or a delete of a 
normal entry.
- all the subentries are stored in a cache loaded at startup, no need to 
fetch an index to get a subentry
- evaluating if an entry is selected is done when the subentry is added 
or moved
- move operations applied on normal entries are rare
- any operation applied on a subentry are considered as admin operation, 
thus unfrequent and scheduled. Downtime is under control.

cons :
- *very* costly add/delete/move/rename operations applied on a subentry, 
as every selected entries will have to be modified
- *very* costly move operation of a normal entry if it moves from one AP 
to another one, and when it has many children, as we have to update all 
the children twice : first to reove the old subentries' reference, the 
second one to add the new references
- Adding or removing a subentry is a non atomic operation, as we have 
first to update the selected entries one by one, and then to add/delete 
the subentry

Solution 2 (evaluating when needed )
pros :
- adding/removing/moving/renaming a subentry is a O(1) operation
- moving normal entries selected by some subentries does not cost more 
than a standard move
- easy to implement

cons :
- each time we access an entry (search, compare), we will have to 
evaluate the subtreeSpecifciation for all the subentries this entry is 
dependent on
- we will need another cache to lookup for the entry's related 
subentries (this cache will work pretty much like the one we use for the 

Basically, sol.1 and sol.2 are both painful, but there is no perfect 
solution anyway.

I thought about what is the best implementation all day long, and if 
yesterday I was convinced that the solution 2 was better. But the 
evaluation cost done on each entry every time a client does a search is 
quite high (let's say up to 20% of the search cost, for simple SS). That 
will not slow down the server that much, as most of the time is 
consummed in the network layer right now, but this won't last.

Now I think that the current solution is may be a not so bad compromise, 
assuming we don't do a lot of move operation cross APs on normal entries.

What bugs me here is that last year, for the opposite reason, the DN was 
removed from the entries stored in the master table, just t be able to 
do O(1) move operations. This is quite paradoxal ! I still think that 
move operations are very rare, and that storing the DN into the entries 
is a net gain for most of the operations, except for a move...

Some side note :
after having done some perf tests on the evaluator, and applied some 
improvement, I can tell that depending on the number of subentries an 
entry is depending on, the cost of this evaluation can goes up to 50% of 
the search itself cost - not counting the network layer -. For instance, 
evaluating a subtreeSpecification with a min and a max, no chop, will be 
done up to 1 000 000 times per second on a 3 level DN (this is all 
dependent on the DN size)

Last, not least : the current implementation is really incomplete. The 
Move, Rename and MoveAndRename operations are not correctly handled, 
with many entries not being updated. I'm going to fix them. I have also 
created a branch to play with subtree without breaking the trunk. I'm 
not sure I will continue to work on this branch if a decision is  made 
to keep the first solution.

Emmanuel L├ęcharny

View raw message