directory-api mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Swift <matthew.sw...@oracle.com>
Subject Re: DN and valueOf( "<a DN>" ) method
Date Mon, 09 Aug 2010 14:16:15 GMT


On 28/07/10 13:07, Emmanuel Lecharny wrote:
>  On 7/28/10 11:31 AM, Stefan Seelmann wrote:
>>> I was thinking lately about the DN class. I know that OpenDS (and 
>>> probably
>>> UnboundId, but not sure) has a DN.valueOf( "<a DN>" ) factory that 
>>> returns a
>>> DN potentially leveraging a cache associated to a ThreadLocal.
>>>
>> ...
>>> I don't think it's such a good idea :
>>> - first, as it's ThreadLocal based, you will have as many cache as 
>>> you have
>>> threads processing requests. Not sure it competes with a unique 
>>> cache, not
>>> sure either we can't use the memory in a better way...
>> An advantage to use ThreadLocal is that you don't need to synchronize
>> access to the cache Could be worth to measure the performance
> Using ConcurrentHashMap should not be a major performance penalty. I 
> mean, it *will* be more costly than not having any synchronization but 
> it sounds acceptable.


Unfortunately a CHM won't help either since you need to manage cache 
eviction, assuming that you want the cache to have a finite size. 
LinkedHashMap has an eviction strategy which can be defined by 
overriding the removeEldestEntry method, but unfortunately LHM is not 
thread safe.


>
> Another possibility is to use a CopyOnWriteArraySet, but I'm afraid 
> that it will crawl if many new DN are added.

Yes - this is not an appropriate collection to use.


>> difference, I wonder if the OpenDS team did some performance analysis?


I did some testing some time back and I have forgotten the exact figures 
that I got. I do remember finding a substantial performance improvement 
when parsing DNs when caching is enabled - something like 30ns with 
caching vs 300ns without for DNs containing 4 RDN components (i.e. about 
an order of magnitude IIRC).

We implement our DNs using a recursive RDN + parent DN structure so we 
are usually able to fast track the decoding process to just a single RDN 
for DNs having a common ancestor (pretty common).

We opted for the ThreadLocal approach due to the synchronization 
limitations of using a single global cache. However, I have often 
worried about this approach as it will not scale for applications having 
large numbers of threads, resulting in OOM exceptions.

Another approach I have thought about is to use a single global 
two-level cache comprising of a fixed size array of LinkedHashMaps 
(think of it as a Map of Maps) each one having its own synchronization. 
We then distribute the DNs over the LHMs and amortize the 
synchronization costs across multiple locks (in a similar manner to CHM).

This idea needs testing. In particular, we'd need to figure out the 
optimal array size (i.e. number of locks / LHMs). For example, 
distributing the cache over 16 LHMs is not going to help much for really 
big multi-threaded apps containing 16000+ threads (1000 threads 
contending per lock).

A major problem with this approach if we choose to use it in the OpenDS 
SDK is that common ancestor DNs (e.g. "dc=com") are going to end up in a 
single LHM so, using our current design (RDN + parent DN), all decoding 
attempts will usually end up contending on the same lock anyway :-( So 
we may need to change our DN implementation to better cope with this 
caching strategy.

We are not alone though: a concurrent Map implementation which can be 
used for caching in a similar manner to LHM is one of the most 
frequently demanded enhancements to the java.util.concurrent library.


> They compared the performances they get with a ThreadLocal cache and 
> no cache : the gain was sensible (I don't have the number for OpenDS). 
> FYI, the DN parsing count for more or less 13% of the whole CPU needed 
> internally (network excluded) to process a simple search, and 
> normalization cost an extra 10%. There is most certainly a net 
> potential gain to implement a DN cache !
>
>

Agreed DN parsing can be expensive, especially normalization.

Matt


Mime
View raw message