On Tue, May 1, 2012 at 4:08 AM, Emmanuel Lécharny <elecharny@gmail.com> wrote:
Hi,

just to inform you that the index branch has been merged with no harm today. I just had to fix 3 conflicts, and two bugs I introduced in the branch before the commit.

The server performance is way better for searches, with a few improvements I did those last 4 days. It was impressive how easy it was to improve the speed with little modifications. The global result is that the server is now :
o Object scope search (lookup) : 49 880 req/s compared to 23 081 on the previous trunk
o One Level scope search (5 entries returned) : 68 715 entries returned per second, compared to 33 120/s
o Sub Level scope search (10 entries returned ) : 70 830 entries returned per second, compared to 18 910/s


This is great work Emmanuel. Nicely done!
 
There is room for more improvement, but it will be more complex. The area that can be improved are :
o get rid of the extra getSearchControls() call in intercepotrs. This is the easiest fix
o review the way we handle entries modification before we return them. Currently, we clone the entry, and remove the attributes the user has not required. See DIRSERVER-1719 for more explaination on this subject. Note that the filtering of attributes represent around 9% of the global CPU time.
o getting back the ID from a Dn is a very costly operation (19% of the global CPU time), and the longer the DN, the longer the operation. For each RDN, we have to do a lookup in the RdnIndex. The only solution would be to have a Dn -> ID cache somewhere. This would boost the server performance, that's for sure.
o fetching an entry from the backend cost 38% of the global time, out of which 29% represent the cost to clone the entry. If we could avoid doing this clone (see upper), we may have some major performances increase.
o when evaluating an entry to see if it fits the filter, we use the reverseIndex, which is also a costly operation. We shoudl re-evaluate if it wouldn't be better to use the MatchingRules comparator to do that instead (reverse lookups account for 4% of the used CPU time)


I guess we have these in JIRA?
 
One interesting result is that the LRUCache.get() operation represent 13% of the used time. This is definitively not small. There is probably some room for some improvement here, but this is way more complex...

All those numbers have been collected using YourKit on a Lookup test (150 000 lookups on one single element have been done)


I wonder what the over the network stats are with a client machine separate from the server machine. Oh and with multiple clients. It's too bad we never got a chance to setup such an environment :( .
 

There are also some improvements to expect on the Add/Delete/Move operation, as we have to delete/add the keys on the RdnIndex. This is something Im going to work on tomorrow.


Cool.
 

One more thing : the number I get when running the server-integ search perf are way below (from 2900 to 5400 per second). This is plain normal. When going through th network, we pay some extra price :
o the client code eats 57% of all the time it takes to run the test
o On the server, normalizing the incoming Dn costs 7% of the processing time
o the entries encoding is very expensive

All in all, on the server, unless we test it on a different machine than the injectors, all the measures are pretty impossible to do. There is too much noise...

I'd be interested to conduct largest tests on a multi-core server, with lots of memory, and a lot of entries, with external injectors, to see what kind of performances we can get...


Ditto.
 

In the next few days, I will probably fix some pending bugs. I think we can cut a M7 release by the end of this week, and make it available by next week.


That sounds great. Thanks!

--
Best Regards,
-- Alex