directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lecharny <elecha...@gmail.com>
Subject Re: [ApacheDS] Going to need to implement a splay tree
Date Wed, 30 Jan 2008 13:00:12 GMT
Hi Alex,

I will comment your mail extensively, with some ideas I have.

Alex Karasulu wrote:
> Background:
>
> ApacheDS used JDBM a B+Tree implementation in it's default partition 
> implementation. JDBM does not allow duplicate keys (same key with many 
> values) and so we abstract this using the Table interface.  Tables 
> support duplicate keys.  So to allow this JdbmTable which implements 
> this interface uses a TreeSet or something called a BTreeRedirect to 
> manage the values of duplicate keys.  When some threshold of values is 
> reached like say 50 for the same key, the JDBM table switches from 
> using a TreeSet to using a BTreeRedirect. A BTreeRedirect is a pointer 
> to another BTree in the same db file. The BTree pointed to contains 
> the values of the duplicate key as it's keys.
>
> Problem:
>
> I was looking into the new Cursor interface and came to the 
> realization that we can no longer use TreeSets efficiently or properly 
> for that matter.  To build a Cursor on this we would have to be able 
> to traverse up and down as the user of the Cursor desired and there's 
> simply no way to do this. 
Considering that a treeset internally contains a NavigableMap, traverse 
a TreeSet is possible. The problem is that you can't move backward as 
soon as you have moved forward, without fetching a new reference of the 
TreeSet. It's basically an enumaration. The question is : do we need to 
move back and forth ?
>
> Proposal:
>
> Implement an in memory linked splay tree with splay Cursor using the 
> links for traversal.  Splay trees are idea for recurring lookups which 
> would be the case for us.  Furthermore splay trees are great for 
> caches.  Splay trees reposition the most frequently accessed nodes 
> close to the root of the tree making access times faster.  They 
> perform poorly though when inserting in an ordered fashion. I think 
> this is acceptable for our use of this data structure for a TreeSet 
> (which uses a red-black tree) and ideal for use in devising a better 
> entry cache higher up in the server.
>
> Thoughts?
There are many different things to consider. First, let's divide, then 
conquer...

1) We have many different kind of data we want to manipulate. Let's list 
them :
o Entries. They are stored into the MasterTable. An entry is a 
serialized Attribute, associated with a unique identifier, which is 
currently a Long (which allows our server to store up to 18 446 744 073 
709 551 615 entries ... 18 quintillions !). Those long must be fetched 
quickly, as we will always get an entry by its ID. Using a BTree for 
that is time consuming, as fetching an entry will be done in O(log(N)). 
We should use a Hash to speedup entries retrieval (of course, at the 
risk that the worst case scenario can transform a search operation to 
cost N read, but this is unlikely...). The only point to consider is 
that this ID is incremental, so the hash function must be chosen carefully.

o DNs. This is a very specific case. DNs are unique, mono valued, 
unordered elements. DNs are not attributes, too. It's unordered, as a DN 
order depends on the RDN attribute's own order, which can be different 
depending on the RDN's attribute. The consequence is that we should use 
a Hash to store the DNs.

o Attributes with single values. Typically, 'uid'. There are supposed to 
be unique, too (ie, you can imagine that an uid can be found more than 
once in a LDAP server (there is no reason why you should not be allowed 
to store a user's UID in many places, if this user is registred in more 
than once... For instance, I use the same uid on several computers, so I 
may have to store these UID in different branches in my LDAP server). We 
can use a Hash table to store such an attribute, because it's supposed 
to be unique. If you have more than one reference, then the question 
will be : how do I manage more than one value associated with this 
attribute (cf next point).
Using a Hash table here means you know that this attribute is not only 
holding a single value, but also it's unique within the whole server. 
Dangerous ...

o All other attributes (including single valued attributes which can be 
found more than once) : HashMap is not really a good choice, except if 
the admin *knows* that the number of values won't be enormous. And we 
have  specific cases where you have a Attribute <-> Value relation where 
an attribute may have thousands (even millions !) of values. The 
'member' attribute is a good example. But we also have to consider the 
ATtribute <-> entries relation. A search request like 
(ObjectClass='person') will use the ObjectClass index, and get all the 
entries containing the 'person' value for tje ObjectClass attribute. 
Potentially, thousands... The current implementation, as explained by 
Alex, instead of storing a list of all the entry IDs linked to the 
'person' value, stores a TreeSet, which is managed in a separate file. 
There is an indirection, a little bit like when you deal with a N-N 
relation in a RDBMS context (when two tables are related with a N-N 
relation, then you create a third table holding the relation between 
both IDs).
Can we do better ? Because this solution is costly, and not really 
effective : you have to deal with the structure duality (you hold either 
a list or a reference to a treeset, depending on the number of 
elements), and this make it more complex to manage cache (i'm not sure 
at this point that we can cache such elements...)
Alex proposal (using Splay trees) won't solve the problem, IMHO. It's 
just a better way to store data in memory, with extended navigation 
possibilities, but it does not help  when it comes to store this 
structure on the disk. As the data are moved frequently, even on read, 
this will increase the cost of searching so much that it will kill the 
main advantage of LDAP vs RDBMS. So, yes, we can use Splay trees in 
memory, but we can't store splay trees on disk.

Some other points to take into consideration is also the fact that we 
may run short of memory, but we don't want the server to stop running, 
just because we got an OOM exception. One way to handle this would be to 
use weak references for the cache.I have no idea what it may imply for a 
Splay tree implementation, but this must be studied.

Now, for those attributes, I have described another solution, based on 
the fact that we use 'long' to store IDs : 
http://cwiki.apache.org/confluence/display/DIRxSRVx11/Backend. I think 
this might be interesting to dig those ideas a little bit to see if it 
fits our need.


Note : Alex's need is more urgent, so I guess that we won't be able to 
implement something very different than what we currently have. I just 
wanted to present some of the ideas i'm playing with for years now... 
(but I never had enough time to go any further than drafting some small 
pages on confluence ;)


-- 
--
cordialement, regards,
Emmanuel L├ęcharny
www.iktek.com
directory.apache.org



Mime
View raw message