directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lecharny <elecha...@gmail.com>
Subject Re: [ApacheDS] Default partition design ideas (was: Re: [ApacheDS] Going to need to implement a splay tree)
Date Sat, 02 Feb 2008 09:22:21 GMT
Alex Karasulu wrote:
> On Jan 30, 2008 8:00 AM, Emmanuel Lecharny <elecharny@gmail.com 
> <mailto:elecharny@gmail.com>> wrote:
>
>
>     o Entries. They are stored into the MasterTable. An entry is a
>     serialized Attribute, associated with a unique identifier, which is
>     currently a Long (which allows our server to store up to 18 446
>     744 073
>     709 551 615 entries ... 18 quintillions !). 
>
>
> Yeah but you had an idea that was great today.  The id today is 
> partition specific: meaning it is unique only within a partition.  As 
> you suggested we need to make sure that it is unique to the server.  
> This will make it so we can conduct searches better which traverse 
> across multiple partitions.
>
> And we need to do this because we intend to nest partitions and do 
> away with the PartitionNexus.
To summarize : having a global ID generator induce that we have one 
place (synchronized) to ask for a new Id. The main issue with this 
approach is that we must persist this Id on disk when we stop the server 
- this is obvious - but also we should be able to ressucitate the 'next' 
Id wrom the disk if we have a crash (remember that we don't write data 
on disk immediatly, as we have a differed write mechanism. Not 
necessarily a problem, though, as we can store this id when writing the 
data on disk every N seconds (dependening on the sync timeframe)
>  
>
>     Those long must be fetched
>     quickly, as we will always get an entry by its ID. Using a BTree for
>     that is time consuming, as fetching an entry will be done in
>     O(log(N)).
>
>
> You're absolutely right a hash would be much better.  We don't need to 
> sort the ID's.
Howard has a very valid point regarding the use of Hashtable vs BTree.
>  
>
>     o Attributes with single values. Typically, 'uid'. There are
>     supposed to
>     be unique, too (ie, you can imagine that an uid can be found more than
>     once in a LDAP server (there is no reason why you should not be
>     allowed
>     to store a user's UID in many places, if this user is registred in
>     more
>     than once... 
>
>
> There is no uniqueness constraint on UID in LDAP. Perhaps you mean the 
> entry UUID attribute?
No. I was mentioning the SINGLE-VALUE vs the MULTI-VALUE. Uid is 
single-valued.
>  
>
>     For instance, I use the same uid on several computers, so I
>     may have to store these UID in different branches in my LDAP
>     server). We
>     can use a Hash table to store such an attribute, because it's supposed
>     to be unique. If you have more than one reference, then the question
>     will be : how do I manage more than one value associated with this
>     attribute (cf next point).
>     Using a Hash table here means you know that this attribute is not only
>     holding a single value, but also it's unique within the whole server.
>     Dangerous ...
>
>
> I think this is not a good idea for UID but it could be great for a 
> UUID index.
If we were to use a HashTable, then I would say : this is up to the user 
to determine if he want to use a HashTable or a Btree for an attribute's 
index. this should certainly not be a static choice we make.
>  
>
>     o All other attributes (including single valued attributes which
>     can be
>     found more than once) : HashMap is not really a good choice, except if
>     the admin *knows* that the number of values won't be enormous. 
>
>
> This is not a good idea because you need sorted access and traversal. 
> Think about an index on age.  You have the > and < operators where you 
> have to advance to some position and walk the index finding all 
> entries that comply.
In fact, the index are depending on the operation. You can use a single 
index for < or > filters, but you need another one for *=, and another 
one for substring operation. I have described some of those in the badly 
outdated page I pointed.
>
> <IDEA>
> Incidentally that just gave me an idea. The server's search mechanism 
> is a bit dumb where ranges are given.  For example if I give it this 
> range (age>45) and an index exists on the 'age' attribute then a 
> cursor is walked and all entries on the walk are returned.  So there 
> is a partial walk of the index.  Now when I give it this filter (& 
> (age>45) (age<60)) then say a cursor is chosen for the first 
> subfilter.  That walk happens and the second is used to assert that 
> the age is less than 60.  When the cursor gets to 61 for example, it 
> should stop but it does not - the walk continues but now the second 
> subfilter says nope don't return it because it's over 61.  We could 
> have in this case just abandoned the walk but the server does not 
> because it's too stupid.  Some smarts are required to merge numeric 
> ranges into a special kind of filter node.

Yeah, that's true. You need to stop a traversal as soon as you reach the 
upper limit. We have to think about how to handle this...
> </IDEA>
>  
>
>     The current implementation, as explained by
>     Alex, instead of storing a list of all the entry IDs linked to the
>     'person' value, stores a TreeSet, which is managed in a separate file.
>
>
> Not in a separate file.  The TreeSet is persisted as the value of a 
> multivalued key to implement duplicate keys.  Careful not to mix this 
> up with attributes or multivalued attributes.
>
> Note that after some threshold though we switch to using another BTree 
> to redirect to.  So when we do a lookup and find that the value is a 
> reference (BTreeRedirect) then we just use another BTree in the same 
> RecordManager (same file).
That what I meant. You have an indirection, and it costs time.I thought 
the redirected tree was stored in another tree, I'm glad this is not the 
case. The important point is that the redirected BTree is not totally 
loaded in memory.
>  
>
>
>     There is an indirection, a little bit like when you deal with a N-N
>     relation in a RDBMS context (when two tables are related with a N-N
>     relation, then you create a third table holding the relation between
>     both IDs).
>
>
> I don't get this analogy at all.  It's totally off.
It's not. Your redirectBTree is this third table. Think about it twice, 
you will see the analogy.
>
>     Can we do better ? Because this solution is costly, and not really
>     effective : you have to deal with the structure duality (you hold
>     either
>     a list or a reference to a treeset, depending on the number of
>     elements), and this make it more complex to manage cache (i'm not sure
>     at this point that we can cache such elements...)
>
>
> It's cached by the record manager.  The record manager caches blocks 
> in the db file with data not the records themselves.

Good point. BPages are loaded in cache, and holds the elements. The way 
it works is not really our business... I have mixed two different caches 
here. Forget about what I said.
>  
>
>     Now, for those attributes, I have described another solution, based on
>     the fact that we use 'long' to store IDs :
>     http://cwiki.apache.org/confluence/display/DIRxSRVx11/Backend. I think
>     this might be interesting to dig those ideas a little bit to see if it
>     fits our need.
>
>
> Yeah there were some neat ideas you had.  They're pretty experimental 
> so we need to test them out.  Would be nice if we can get our lab up 
> and running to test the differences in design.
>
> BTW this backend documentation incorrect when it explains the present 
> implementation.  Please yank that page and just extract your ideas to 
> expose them because otherwise people learn incorrect things when you 
> publish documentation with misconceptions in them.
Ok, I got the message :) This page is pretty old and contains bad 
assumptions about the way the backend is implemented. I will update it 
today (at least, I will mark some parts as outdated)
>
> I apologize if I sound negative - I rewrote the above 3 times to make 
> it softer :). 
Thanks buddy ! It does not sound negative, it's just a fact : it's 
outdated, and misleading !


-- 
--
cordialement, regards,
Emmanuel L├ęcharny
www.iktek.com
directory.apache.org



Mime
View raw message