directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ole Ersoy <ole.er...@gmail.com>
Subject Re: [ApacheDS][Partition] Using surrogate keys for attributeType aliases and objectClass aliases (was Re: [SCHEMA] Can two different LDAP AttributeType's have the same name?)
Date Sat, 07 Apr 2007 00:31:49 GMT


Emmanuel Lecharny wrote:
SNIP
> well, the short answer is 'no'. As stated by alex, we should keep two 
> forms : the user provided form, and the normalized form. 

So the Normalized form is always minimized?

>For performance 
> reasons, we should not compute the normalized form each time we do a 
> search operation, this will kill the server. keep in mind that a ldap 
> server is 99,99% read, and less tha 0,01write. 

Although once the DAS gets done, people could start using
ADS as an RDB.

What I'm really trying to understand is whether
the server can be setup to do something like this,
because I think it would minimize an in
memory partition.

Right now I'm just thinking that I have an attribute
value I want to get.  I have the DN of the entry and
I know which attribute.

The DN: ou=blah ou=blah ou=blah

The Attribute Name: 
com.example.blah.blah.blah.blah.blah.blahblah.blahhhhhhh.something

So I tell JNDI to go and get this.

So it tells ADS.

Then ADS looks up
com.example.blah.blah.blah.blah.blah.blahblah.blahhhhhhh.something

in a map<name, key>
where name is the attribute name
com.example.blah.blah.blah.blah.blah.blahblah.blahhhhhhh.something
and the value is the key used to look up the value of

com.example.blah.blah.blah.blah.blah.blahblah.blahhhhhhh.something
in the entry.

So It uses the map to look this up,
and it gets a return value like "500".

Then it gets the entry with DN:
ou=blah ou=blah ou=blah

And uses the key "500"
to look up the value of
com.example.blah.blah.blah.blah.blah.blahblah.blahhhhhhh.something

This is just my impression of what would be fast and use little memory.

I'm sorry I have not really have had time to understand the search 
aspects yet.  I've read Alex's mail twice already, but I need to break
it down more for the concepts to sink in.  So I hope it's OK that
I write it down as I understand it.  I just want to make sure I throw
it out there as clearly as I can in case it could be useful.

For the DAS just reading and writing datagraph's I think this type
of partition architecture would perform really well, but for
search I can see how it can be very different.

>So it definitively worth 
> the price to spend a *lot* of time writing twice the data in to 
> different forms than do a computation for each search. Adding entries in 
> ADS is 10 to 20 times slower than reading them.

Is that because they are written to many different forms during the one 
write.

It would be neat if it could just write one form. per configuration,
assuming a certain usage scenario.

> 
>>
>> So when I'm using JNDI to update an attribute, and the
>> key of my Attribute is 
>> "com.example.blah.blah.blah.blah.blah.blahblah.blahhhhhhh.something"
>> ApacheDS takes that key and turns it into the shortest possible number
>> it can before storing it?
> 
> yes. We have the exact equivalence to "sequecnes" in oracle, and we use 
> them to give a Long for each entry, and for each indexed attribute.

OK - Now it sort of sounds like we are saying the same thing I think.

> 
>>
>> But is this really important ? Just think about the 80/20 rule
>>
>>> (and it's much closer to a 95/5) : 20 percent of all entries will be 
>>> accessed 80% of time. A good cache will usually gives you the same 
>>> result (or close to) as if you put everything in memory. This is very 
>>> basic IT theory...
>>
>>
>> Yes - Totally - For search operations that type of tweaking is awesome
>> and effective.  It applies to Supply Chain Applications some times, 
>> and other times all the data is fair game.  For instance
>> the application might be calculating Optimal Inventory Figures for all
>> SKUs and and wants to do the run "Superfast", so it wants all the data
>> in memory.
> 
> Maybe in Supply Chain Apps. But a Ldap Server is totally different. 
> Don't think like if you only have a hammer... In your case, Ldap being 
> very fast, even compared to a RDBMS, it might be interesting to use i. 
> But you should also consider other elements, like the cost of writing in 
> it, and the cost of a traversal (doing a full scan).

Yeah - I left that part of the DAS Design Guide out for now :-)
I'll start thinking through how searches are done as soon
as I have a prototype for just reading and writing DataGraph instances.

Thanks for putting up with all my "Brain Queries".

- Ole

SNIP

Mime
View raw message