directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lécharny <elecha...@gmail.com>
Subject Re: Cache : there is some room for improvement...
Date Wed, 04 Dec 2013 09:15:14 GMT
Le 12/3/13 11:20 PM, Howard Chu a écrit :
> Emmanuel Lécharny wrote:
>> Hi !
>>
>> last numbers I got are quite interesting, now that we are corectly
>> leveraging the cache (alias cache, ParentIdAndRdn cache aka PIAR cache,
>> entry cache). Still, the way we configure and initialize the cache is
>> far from being perfect. I'll summarize some findings I gathered during
>> those last weeks here.
>>
>> 1) Cache is critical to performances.
>> When we process a search, there are many areas where we access the
>> backend (be it JDBM or Mavibot) and we would gain for not doing so. By
>> adding a cache for Aliases and ParentIdAndRdn, I was able to get a 25%
>> speed improvement (assuming the cache is hit everytime). The very same
>> for the entry cache : having all the entries loaded into the cache is a
>> major factor of speed.
>>
>> So we need a big entry, aliases and ParentIdAndRdn cache, that's for
>> sure.
>>
>> 2) The cache configuration is not perfect.
>> I discovered that the entry cache was initialized with a value of 1
>> entry being cached... Obviously, it's a bit tight. But the pb is that
>> whatever configuration you set, it won't change !
>> So I fixed that (the ugly way).
>> The real problem is that the cache configuration and initialization is a
>> mess... We use a CacheService class (good thing !) which is not
>> initialized in some tests, so I had to check if the cache is not null
>> before using it in many parts of the code. This has to be fixed. The
>> various caches (aliases, entry, PIAR aren't all initialized into to
>> AbstractBTreePartition, for instance).
>>
>> We also have various cache configurations :
>> - partition cache
>> - index cache
>>
>> This is not clear what parameter is used for which cache. We have to get
>> this fixed.
>>
>> 3) Backend cache and ApacheDS cache
>> The backend cache and teh ADS cache are two different things. In
>> Mavibot, we cache Pages. In JDBM, we also cache Pages. In ADS, we cache
>> entries, aliases, etc. Atm, the configuration makes it not clear which
>> cache is being set (although the index cacheSize parameter is only used
>> to set the backend cache size).
>>
>> The thing is that in JDBM, each single index can have its own cache,
>> when the cache is global in Mavibot. In other words, we can't really
>> assume that configuring the backend cache is something generic.
>>
>> Otherwise, we are using EhCache, and a dedicated configuration file for
>> it. It would be good not to have to manipulate this file at all, and
>> have the cache configuration all in ADS config.
>>
>> Well, there is some room for improvement in this area
>>
>> 4) Which cache should we favor ?
>> Backend page cache is useless if the ADS cache are loaded, except if we
>> are using indexes. That means we need both. The thing is that what is
>> expensive when brosing a BTree is not only to fetch pages from the disk,
>> but also to deserialize them. It would be good to keep the index pages
>> in memory (as we don't have any cache at the ADS level for indexes) and
>> not to cache the MasterTable (as we have an EntryCache) nor the RdnIndex
>> (for the same reason : we already have the PIAR index). This requires
>> some information to be propagated to the backend cache (do *not* cache
>> this BTree, do cache this one...).
>>
>> There is room for improvement here.
>>
>> 5) What if we have enough memory ?
>> 90% of the raw search time is caused by the entry cloning. We *have* to
>> avoid cloning the entry if we want to get better performances. This is
>> what we should work on.
>>
>> Regardless, if we don't have enough memory, at the end of the day, the
>> server will hit the disk and we will get way lower performances (by at
>> least one order of magnitude). This is something to keep in mind when
>> doing perf tests : we are NOT testing the disk performance, we are
>> testing the server performance. Running a benchmark when there is not
>> enough memory to have cache loaded is a waste of time, as the impact of
>> disk reads is so huge it will hide any improvement we can make on the
>> server.
>>
>> Soooo : we need enough memory to run the server ! The pb is : how much
>> memory do we need ? This is the tricky part...
>
> This all sounds like the hoops we had to jump through to balance
> back-bdb/hdb entry cache/dn cache/index cache against BerkeleyDB
> caches. But from the sound of it, your situation is even more
> complicated than ours was.
Well, not so much because we own the Mavibot code :-)

The real problem here is the cache configuration. Now, in lmdb, you
don't have to care abot caching the pages, the OS is doing that for you.
This is not the same story on Mavibot, because we *have* to convert
pages to Java objects, which can be costly. Saving this deserialization
time is most certainly interesting, at least for the top level of each
btrees. However, I'm pulling this feeling out of thin air : on a decent
LDAP server installation, you should have enough memory to cache all the
entries anyway.

Except that ADS is more likely to be used embedded, when the memory size
is limited and the performance is less critical (to some extend).

A lot of fun for those going to 'optimize' ApacheDS in the future :-)


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com 


Mime
View raw message