directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Howard Chu <>
Subject Re: [ApacheDS] Default partition design ideas
Date Sat, 02 Feb 2008 18:03:13 GMT
Alex Karasulu wrote:
> Howard, thanks much for this information; it really helps and is
> extremely valuable to us. We're very lucky and grateful to have you here.

My pleasure ;)

> More inline ...

>     A good hash function is one that evenly distributes the input keys
>     across the
>     entire hash table. This makes hashing extremely cache-unfriendly
>     when doing
>     sequential traversals of a database, or sequentially bulk-loading.

> Very good point about these opposing factors. So I figure OpenLDAP just
> uses the cache in the underlying B-Tree instead of managing some kind of
> separate entry cache?

No, we have a separate entry cache too. I originally wrote back-bdb without 
any entry caching, but Jong @ IBM benchmarked/profiled it and implemented an 
entry cache for it before 2.1 was released. (Back then I was happy just to get 
it working, at speeds comparable to back-ldbm. How things change ;)

The motivation here is that the serialized data that we store in the database 
is not directly usable with our in-memory data structures. As such, even when 
the B-Tree cache is 100% effective, there's a cost associated with 
deserialization that we can avoid by using our own entry cache. I rewrote the 
entry cache for 2.2, and it's been getting slight tweaks ever since. E.g., in 
2.3 I added threaded AVL trees to our utility library so that we can do fast 
sequential traversals of various caches.

As we're still in the early days of 64 bit computing, I'd be tempted to play 
with a single-level-store concept again, where disk addresses map 1:1 to 
memory addresses. In that case, we could simply write entries in their 
in-memory form straight to disk, and eliminate a level of caching. This 
concept worked back when virtual address spaces were 32 bits but data volumes 
tended to be on the order of 24 bits. When the overall data volume approaches 
the size of the virtual address space, this kind of scheme falls down.

>      > As the data are moved frequently, even on read,
>      > this will increase the cost of searching so much that it will
>     kill the
>      > main advantage of LDAP vs RDBMS. So, yes, we can use Splay trees in
>      > memory, but we can't store splay trees on disk.
>      > Yeah I agree. We can use splay trees for caches or for these low
>      > duplicate count records.
>     You will find that anything that turns memory read operations into
>     memory
>     write operations will scale very poorly in a multiprocessor system.

> I guess this is due to synchronization. A splay based cache then defeats
> itself since it requires a splay operation (memory write) on every lookup.


There are plenty of interesting data structures/system architectures to 
explore, and now you've got the lab resources to try them out.

>      > Yeah they're great ideas. We just need to have a solid SLAMD lab and
>      > start testing these ideas out. I got the machines:
>      >
>      > 9 load injectos
>      > 1 SLAMD Server
>      > 1 beefy server for running ApacheDS>

   -- Howard Chu
   Chief Architect, Symas Corp.
   Director, Highland Sun
   Chief Architect, OpenLDAP

View raw message