directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Karasulu" <akaras...@apache.org>
Subject [ApacheDS] Default partition design ideas (was: Re: [ApacheDS] Going to need to implement a splay tree)
Date Sat, 02 Feb 2008 07:59:16 GMT
On Jan 30, 2008 8:00 AM, Emmanuel Lecharny <elecharny@gmail.com> wrote:

...

There are many different things to consider. First, let's divide, then
> conquer...
>
> 1) We have many different kind of data we want to manipulate. Let's list
> them :
> o Entries. They are stored into the MasterTable. An entry is a
> serialized Attribute, associated with a unique identifier, which is
> currently a Long (which allows our server to store up to 18 446 744 073
> 709 551 615 entries ... 18 quintillions !).


Yeah but you had an idea that was great today.  The id today is partition
specific: meaning it is unique only within a partition.  As you suggested we
need to make sure that it is unique to the server.  This will make it so we
can conduct searches better which traverse across multiple partitions.

And we need to do this because we intend to nest partitions and do away with
the PartitionNexus.


> Those long must be fetched
> quickly, as we will always get an entry by its ID. Using a BTree for
> that is time consuming, as fetching an entry will be done in O(log(N)).


You're absolutely right a hash would be much better.  We don't need to sort
the ID's.

We should use a Hash to speedup entries retrieval (of course, at the
> risk that the worst case scenario can transform a search operation to
> cost N read, but this is unlikely...). The only point to consider is
> that this ID is incremental, so the hash function must be chosen
> carefully.


This should suite our needs:

http://jdbm.sourceforge.net/V1.0/doc/api/jdbm/htree/HTree.html


> o DNs. This is a very specific case. DNs are unique, mono valued,
> unordered elements. DNs are not attributes, too. It's unordered, as a DN
> order depends on the RDN attribute's own order, which can be different
> depending on the RDN's attribute. The consequence is that we should use
> a Hash to store the DNs.


That's fine.  We can do this since the JDBM HTree allows us to iterate
through values and keys even though it's not sorted which does not matter to
us.


> o Attributes with single values. Typically, 'uid'. There are supposed to
> be unique, too (ie, you can imagine that an uid can be found more than
> once in a LDAP server (there is no reason why you should not be allowed
> to store a user's UID in many places, if this user is registred in more
> than once...


There is no uniqueness constraint on UID in LDAP. Perhaps you mean the entry
UUID attribute?


> For instance, I use the same uid on several computers, so I
> may have to store these UID in different branches in my LDAP server). We
> can use a Hash table to store such an attribute, because it's supposed
> to be unique. If you have more than one reference, then the question
> will be : how do I manage more than one value associated with this
> attribute (cf next point).
> Using a Hash table here means you know that this attribute is not only
> holding a single value, but also it's unique within the whole server.
> Dangerous ...


I think this is not a good idea for UID but it could be great for a UUID
index.


> o All other attributes (including single valued attributes which can be
> found more than once) : HashMap is not really a good choice, except if
> the admin *knows* that the number of values won't be enormous.


This is not a good idea because you need sorted access and traversal. Think
about an index on age.  You have the > and < operators where you have to
advance to some position and walk the index finding all entries that comply.

<IDEA>
Incidentally that just gave me an idea. The server's search mechanism is a
bit dumb where ranges are given.  For example if I give it this range
(age>45) and an index exists on the 'age' attribute then a cursor is walked
and all entries on the walk are returned.  So there is a partial walk of the
index.  Now when I give it this filter (& (age>45) (age<60)) then say a
cursor is chosen for the first subfilter.  That walk happens and the second
is used to assert that the age is less than 60.  When the cursor gets to 61
for example, it should stop but it does not - the walk continues but now the
second subfilter says nope don't return it because it's over 61.  We could
have in this case just abandoned the walk but the server does not because
it's too stupid.  Some smarts are required to merge numeric ranges into a
special kind of filter node.
</IDEA>



> And we
> have  specific cases where you have a Attribute <-> Value relation where
> an attribute may have thousands (even millions !) of values. The
> 'member' attribute is a good example. But we also have to consider the
> ATtribute <-> entries relation. A search request like
> (ObjectClass='person') will use the ObjectClass index, and get all the
> entries containing the 'person' value for tje ObjectClass attribute.
> Potentially, thousands...


Heh or millions hopefully.


> The current implementation, as explained by
> Alex, instead of storing a list of all the entry IDs linked to the
> 'person' value, stores a TreeSet, which is managed in a separate file.


Not in a separate file.  The TreeSet is persisted as the value of a
multivalued key to implement duplicate keys.  Careful not to mix this up
with attributes or multivalued attributes.

Note that after some threshold though we switch to using another BTree to
redirect to.  So when we do a lookup and find that the value is a reference
(BTreeRedirect) then we just use another BTree in the same RecordManager
(same file).


>
> There is an indirection, a little bit like when you deal with a N-N
> relation in a RDBMS context (when two tables are related with a N-N
> relation, then you create a third table holding the relation between
> both IDs).


I don't get this analogy at all.  It's totally off.

Can we do better ? Because this solution is costly, and not really
> effective : you have to deal with the structure duality (you hold either
> a list or a reference to a treeset, depending on the number of
> elements), and this make it more complex to manage cache (i'm not sure
> at this point that we can cache such elements...)


It's cached by the record manager.  The record manager caches blocks in the
db file with data not the records themselves.


> Alex proposal (using Splay trees) won't solve the problem, IMHO. It's
> just a better way to store data in memory, with extended navigation
> possibilities, but it does not help  when it comes to store this
> structure on the disk.


Right exactly!  I had no intension at all of using a splay tree for anything
other than as a replacement to a BTree.


> As the data are moved frequently, even on read,
> this will increase the cost of searching so much that it will kill the
> main advantage of LDAP vs RDBMS. So, yes, we can use Splay trees in
> memory, but we can't store splay trees on disk.


Yeah I agree.  We can use splay trees for caches or for these low duplicate
count records.


>
> Some other points to take into consideration is also the fact that we
> may run short of memory, but we don't want the server to stop running,
> just because we got an OOM exception. One way to handle this would be to
> use weak references for the cache.I have no idea what it may imply for a
> Splay tree implementation, but this must be studied.


I agree that we need to have a cache with weak references for sure.  We need
to evaluate all this at some point.


> Now, for those attributes, I have described another solution, based on
> the fact that we use 'long' to store IDs :
> http://cwiki.apache.org/confluence/display/DIRxSRVx11/Backend. I think
> this might be interesting to dig those ideas a little bit to see if it
> fits our need.


Yeah there were some neat ideas you had.  They're pretty experimental so we
need to test them out.  Would be nice if we can get our lab up and running
to test the differences in design.

BTW this backend documentation incorrect when it explains the present
implementation.  Please yank that page and just extract your ideas to expose
them because otherwise people learn incorrect things when you publish
documentation with misconceptions in them.

I apologize if I sound negative - I rewrote the above 3 times to make it
softer :).

Note : Alex's need is more urgent, so I guess that we won't be able to
> implement something very different than what we currently have. I just
> wanted to present some of the ideas i'm playing with for years now...


Yeah they're great ideas.  We just need to have a solid SLAMD lab and start
testing these ideas out.  I got the machines:

9 load injectos
1 SLAMD Server
1 beefy server for running ApacheDS

We just need someone to step up and help us manage this environment.

Any volunteers would be appreciated.

Thanks,
Alex

Mime
View raw message