directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Howard Chu <...@symas.com>
Subject Re: Cache : there is some room for improvement...
Date Tue, 03 Dec 2013 22:20:33 GMT
Emmanuel L├ęcharny wrote:
> Hi !
>
> last numbers I got are quite interesting, now that we are corectly
> leveraging the cache (alias cache, ParentIdAndRdn cache aka PIAR cache,
> entry cache). Still, the way we configure and initialize the cache is
> far from being perfect. I'll summarize some findings I gathered during
> those last weeks here.
>
> 1) Cache is critical to performances.
> When we process a search, there are many areas where we access the
> backend (be it JDBM or Mavibot) and we would gain for not doing so. By
> adding a cache for Aliases and ParentIdAndRdn, I was able to get a 25%
> speed improvement (assuming the cache is hit everytime). The very same
> for the entry cache : having all the entries loaded into the cache is a
> major factor of speed.
>
> So we need a big entry, aliases and ParentIdAndRdn cache, that's for sure.
>
> 2) The cache configuration is not perfect.
> I discovered that the entry cache was initialized with a value of 1
> entry being cached... Obviously, it's a bit tight. But the pb is that
> whatever configuration you set, it won't change !
> So I fixed that (the ugly way).
> The real problem is that the cache configuration and initialization is a
> mess... We use a CacheService class (good thing !) which is not
> initialized in some tests, so I had to check if the cache is not null
> before using it in many parts of the code. This has to be fixed. The
> various caches (aliases, entry, PIAR aren't all initialized into to
> AbstractBTreePartition, for instance).
>
> We also have various cache configurations :
> - partition cache
> - index cache
>
> This is not clear what parameter is used for which cache. We have to get
> this fixed.
>
> 3) Backend cache and ApacheDS cache
> The backend cache and teh ADS cache are two different things. In
> Mavibot, we cache Pages. In JDBM, we also cache Pages. In ADS, we cache
> entries, aliases, etc. Atm, the configuration makes it not clear which
> cache is being set (although the index cacheSize parameter is only used
> to set the backend cache size).
>
> The thing is that in JDBM, each single index can have its own cache,
> when the cache is global in Mavibot. In other words, we can't really
> assume that configuring the backend cache is something generic.
>
> Otherwise, we are using EhCache, and a dedicated configuration file for
> it. It would be good not to have to manipulate this file at all, and
> have the cache configuration all in ADS config.
>
> Well, there is some room for improvement in this area
>
> 4) Which cache should we favor ?
> Backend page cache is useless if the ADS cache are loaded, except if we
> are using indexes. That means we need both. The thing is that what is
> expensive when brosing a BTree is not only to fetch pages from the disk,
> but also to deserialize them. It would be good to keep the index pages
> in memory (as we don't have any cache at the ADS level for indexes) and
> not to cache the MasterTable (as we have an EntryCache) nor the RdnIndex
> (for the same reason : we already have the PIAR index). This requires
> some information to be propagated to the backend cache (do *not* cache
> this BTree, do cache this one...).
>
> There is room for improvement here.
>
> 5) What if we have enough memory ?
> 90% of the raw search time is caused by the entry cloning. We *have* to
> avoid cloning the entry if we want to get better performances. This is
> what we should work on.
>
> Regardless, if we don't have enough memory, at the end of the day, the
> server will hit the disk and we will get way lower performances (by at
> least one order of magnitude). This is something to keep in mind when
> doing perf tests : we are NOT testing the disk performance, we are
> testing the server performance. Running a benchmark when there is not
> enough memory to have cache loaded is a waste of time, as the impact of
> disk reads is so huge it will hide any improvement we can make on the
> server.
>
> Soooo : we need enough memory to run the server ! The pb is : how much
> memory do we need ? This is the tricky part...

This all sounds like the hoops we had to jump through to balance back-bdb/hdb 
entry cache/dn cache/index cache against BerkeleyDB caches. But from the sound 
of it, your situation is even more complicated than ours was.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Mime
View raw message