directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lecharny <>
Subject Re: [ApacheDS] Default partition design ideas
Date Sun, 03 Feb 2008 21:48:13 GMT
Howard Chu wrote:
>>     A good hash function is one that evenly distributes the input keys
>>     across the
>>     entire hash table. This makes hashing extremely cache-unfriendly
>>     when doing
>>     sequential traversals of a database, or sequentially bulk-loading.
>> Very good point about these opposing factors. So I figure OpenLDAP just
>> uses the cache in the underlying B-Tree instead of managing some kind of
>> separate entry cache?
> No, we have a separate entry cache too. I originally wrote back-bdb 
> without any entry caching, but Jong @ IBM benchmarked/profiled it and 
> implemented an entry cache for it before 2.1 was released. (Back then 
> I was happy just to get it working, at speeds comparable to back-ldbm. 
> How things change ;)
> The motivation here is that the serialized data that we store in the 
> database is not directly usable with our in-memory data structures. 
I have done some very quick-and-dirty micro benchmarks for DN 
serialization. This is not directly related to Entries, but the idea was 
to tests different solutions for serializing data.  I do a serialization 
followed by a deserialization. Here are the results :

Serialization/deserialization for 1 million  DN = "ou= Some   People   + 
dc=  And   Some anImAls,dc = eXample,dc= cOm"

 Serializing te entire DN structure (DN, RDNs, AttributeTypeAndValues) : 
28 507 ms
 Serialiazing a DN as a string and parsing it back when deserializing : 
37 400 ms
 With a transformation of this String into byte[] : 37 467 ms
 Test the Creation of a DN and Parsing (no serialization): 13 697 ms
 Normalization (no serialization): 15 643 ms

Note : those serializations are done in memory (no disk access at all)

Conclusion :
1) obviously, it's more costly to parse a DN than to rebuild the entire 
2) on my laptop, 1.7Ghz centrino, I can at best serialize and 
deserialize around 35 000 DN per second.

The point (2) is very important : an entry iis way bigger than a DN, but 
also less complex (no parsing involved). We can consider that dealing 
with an Entry will cost more. This imply that having a pre-cache (an 
entry cache) will be much more efficient than having a single page cache 
of serialized data. I would suggest that we should just cache Btree 
structure leaves (those which only contains keys), and leave the entries 
to the pre-cache. It might be the best solution (tips : to be tested and 
validatd !)

Note : I have done these performance tests before reading this mail.

> As such, even when the B-Tree cache is 100% effective, there's a cost 
> associated with deserialization that we can avoid by using our own 
> entry cache. I rewrote the entry cache for 2.2, and it's been getting 
> slight tweaks ever since. E.g., in 2.3 I added threaded AVL trees to 
> our utility library so that we can do fast sequential traversals of 
> various caches.
I don't know why AVL trees are so under-utilized... They are probably 
one of the most interesting structure when you want to balance the cost 
of sorting data and the cost of minimizing the tree height... Maybe 
because it's complex to implement compared to B-trees ?
> As we're still in the early days of 64 bit computing, I'd be tempted 
> to play with a single-level-store concept again, where disk addresses 
> map 1:1 to memory addresses. In that case, we could simply write 
> entries in their in-memory form straight to disk, and eliminate a 
> level of caching. This concept worked back when virtual address spaces 
> were 32 bits but data volumes tended to be on the order of 24 bits. 
> When the overall data volume approaches the size of the virtual 
> address space, this kind of scheme falls down.
This is a ath we might want to explore in java, as we have memory-mapped 
files. I don't know ...
>> I guess this is due to synchronization. A splay based cache then defeats
>> itself since it requires a splay operation (memory write) on every 
>> lookup.
> Right.
> There are plenty of interesting data structures/system architectures 
> to explore, and now you've got the lab resources to try them out.
OpenLdap design is just fine for such tests, as you can define a 
specific backend for a partition, so you can test a lot of different 
solutions. We are working on this too, but it's not ready yet (but will 
be soon, as Alex is on it)

Thanks Howard !

cordialement, regards,
Emmanuel L├ęcharny

View raw message