On Tue, Jul 20, 2010 at 11:15 AM, Emmanuel Lecharny <elecharny@gmail.com> wrote:
 Hi Howard,

On 7/20/10 9:29 AM, Howard Chu wrote:Some side note :

after having done some perf tests on the evaluator, and applied some
improvement, I can tell that depending on the number of subentries an
entry is depending on, the cost of this evaluation can goes up to 50% of
the search itself cost - not counting the network layer -. For instance,
evaluating a subtreeSpecification with a min and a max, no chop, will be
done up to 1 000 000 times per second on a 3 level DN (this is all
dependent on the DN size)

IMO, the considerations here are the same as for the O(1) rename. I.e., when you remove the entryDN from the entry in the DB, you have to calculate the DN on the fly, and it certainly is a frequently referenced datum. You make this cheap by caching the entryDN in memory, and it's very clear when a cached DN must be invalidated - most of the time the cached value will not change.
The DN cache is most certainly needed for faster operations. Building a DN on the fly for every entry is one of the most costly operation, so if we can speed it up with a cache, it's a net gain. Having the DN in the entry OTOH is not necessary a big gain : you still have to deserialize it if it's not in cache, and this is also costly.

Obviously, all those considerations fell in a big dark hole if you have a decent entry cache, as the entries in memory already store the full DN... Any modification like a rename or a move will of course invalidate the entries in this cache.

All in all, most of the case, you don't have to do all those computations...

Regarding the subtree handling, it's different, as you can't spare the entry evaluation if the entries don't contain the reference to the subentry they depend upon. This evaluation can be costly, up to a point it's more expensive than fetching the entry itself.

The rational being the choice I made 3 years ago (and which was reverted) to put the DN into the entry was just to speed up any search by avoiding costly computation at a price of costly unfrequent operations like Move or Rename (MODDN).

If you have to move data in a Ldap base, User, then you have to pay the price !


Well yes but even renames cost the same as moves if the DN is in the entry. Someone changing an ou=People to ou=Users containing 100 Million entries should not expect to wait hours before it completes. Plus the atomicity issue is seriously nasty. The DN embedded into the Entry was definitely not the way to go. In fact Kiran and Seelmann's new RDN index to replace the DN index saved us big time making these operations atomic, faster and safer.

--
Alex Karasulu
My Blog :: http://www.jroller.com/akarasulu/
Apache Directory Server :: http://directory.apache.org
Apache MINA :: http://mina.apache.org
To set up a meeting with me: http://tungle.me/AlexKarasulu