Hi Stefan,

On Wed, Apr 2, 2008 at 3:27 PM, Stefan Seelmann <seelmann@apache.org> wrote:
Hi Alex,


Alex Karasulu schrieb:

Stefan S.,

Do we use a secondary cache for Studio? Just wondering because of the performance issues someone noted on the user list when dealing with a very large directory.  

no, we don't use a cache for Studio. I tried to use ehcache long time ago but it was really slow when it swaps entries to the disk and back.

Today there we have the following:
- a HashMap<String, IEntry>: with the DN as key and the entry as value (IEntry is a Studio internal interface with some implementations)
- a HashMap<IEntry, AttributeInfo>: as soon as the attributes of an entry were loaded an AttributeInfo containing all attributes and other information is created and put to this map
- a hashMap<IEntry, ChildrenInfo>: as soon as child entries are loaded a ChildrenInfo containing all child entries is created and put to this map.

For sure, this does scale well. With the default VM parameters (64MB heap) you could load about 30,000 entries to get an OutOfMemory :-(((

I also think that the switch from the old DN/RDN implementation to the shared-ldap LdapDN/Rdn implementation costs some memory. We should consider to do some test for performance and memeory consumtion.


Oh that's not good.  We need to cleanup that code anyway so might be good to work in some optimization. 

Emmanuel had a good idea at some point to build a simple parser for DNs along with a simpler LdapDN class for handling most general cases.  If this parser fails then another corner-case parser continues where the first left off.

All these crazy and complicated corner cases like with multi-attribute Rdns and character issues cost more memory.  They can then be handled by this special DN parser with it's resective special LdapDN object that has additional structures for handling these the tracking of these complex DNs.

If 99% of the time the simple LDAP DNs are used with smaller footprint, then we can reduce complexity and memory usage, while increasing performance.  This will have an impact for both ApacheDS and Studio.
 


The idea occurred to me that the JDBM code for JdbmTable and JdbmIndex could potentially be used by Studio to help solve some of the caching problems.  If this is something you think will help we can move this code into shared so both the server and Studio can leverage them.

Yeah, that sounds great. You want me to look how the JDBM code works? I guess we will have some time at ApacheCon for that.

I was just pointing it out if you were interested in using it.  This is just a wrapper that abstracts most BTree implementations with a common interface.  The JDBM implementation which we use by default might be handy for a secondary cache to swap entries out.
 
We can talk about it if you want to use it at AC.  Also here's a link to the documentation (which I am really proud of :D):

    http://cwiki.apache.org/confluence/display/DIRxSRVx11/Index+and+IndexEntry

Regards,
Alex