directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Karasulu <>
Subject Re: DnNormalizer
Date Sat, 05 Feb 2005 03:21:40 GMT
Emmanuel Lecharny wrote:

>I may have missed something, so the following should only be taken for
>no more than my own perception of the problem :
>I think that we should consider two cases :
>- values that are sent through PDU
>- values that are sent through files (ldif)
>The first case does not need normalization : it's already done while
>decoding the PDU
When conducting a search an LDAP server must evaluate a filter 
expression composed of assertion value pairs.  Filters like

(& (locale=  SanTA BaRBara) (OU=Human     Resources  ) )

need to be evaluated.  Regardless of the space or character case 
varience in the values provided for these assertions ( based on case 
insensitive attributes) the result set should be the same.  Backends 
usually build indices to rapidly lookup entries within the system that 
match these assertions.  Beyond these there are system indices as well.  
When a directory entry is added any attributes of the entry 
corresponding to indexed attributes are normalized based on the schema 
associated with the attribute.  So an attribute that is case sensitive 
like a UNIX file name will not have its case normalized.  Whereas local 
and ou values will be case normalized.  So ApacheDS pays the tax of 
normalization when performing write based operations like add, 
modify(dn), and delete. This keeps searches fast and after all LDAP is a 
read optimized store. 

Now for DN's we need to normalize them in a similar fashion and keep 
both the user provided DN as is when the entry was added and the 
normalized DN which is added to a system index for entry addressing.  
This way when scanning the normalized DN index we do not need to 
normalize values of existing entries only the arriving DN within a PDU.

>For ldif files, that quite different. Spacing should never be a problem.
>LDAP server should store trimed values, so no difference. A space, a
>tab, a nbsp are differents char, so they are stored as is. If a user
>send a space instead of a nbsp;, too bad for him ! (modify or delete
>orders, for instance).
You always want to keep the data that was submitted as is the same to 
return it without modification.  However you obviously have to normalize 
this for adding values to indices.  Usually the rule of thumb with 
whitespace normalization is to do a deep trim without changing 
tokenization order unless quotations are used to signify literal text.  
In the LDAP space people call this the string prep function. 

>There may be only one specially vicious case : a LDAP client that send a
>request without triming spaces. (M$ could do that ! Embrass and extend
>stuff). Then you are dead... Don't know if you have to deal with this
>kind of brain dead client tier?
Client can send anything - we must presume this.  I did not think 
clients were required to normalize things.  As a matter of fact they 
should not.  User data including the DN should be provided as is and 
returned as is.  For example if I added an entry and used the following DN

(note 5 extra whitespace characters between 'Wachy' and 'Users' words)

uid=akarasulu,ou=Wacky      Users, dc=apache, dc=org

then the client should not be changing this.  That's the way the DN 
might need to appear for some crazy reason.  That's what the user may 
have wanted.  So when we search and return akarasulu's entry then we 
should see the DN as it was given to us.  However behind the scenes the 
server must normalize this so a compare on the password of user entry,

uid=akarasulu,ou=Wacky Users, dc=apache, dc=org

still addresses the right user to return the correct result.  I may be 
wrong but this was my impression.  Please someone double check this 
because its been so long.  I may have lost my sanity here too.

>Am I a total fool, or just pretending that I'm sane?
>Please feel free to tell me !
Well first off you need to be insane to be here so no need to talk about 
sanity when we're all a little cookoo.

No need to worry these are all very good points.  Let's keep discussing 
them until we all have a better understanding.  It will take time to 
have stuff sink in. 

Really the drive for all this crazyness is just for setting things up 
for search: to be able to match entries.  Everything else is ancillary 
and just there as setup for this function which is the heart of a 
directory server.  Can't wait to talk to you about search algorithm when 
you spend time in the search engine where all this normalization 
craziness will make more sense. 

Hope this helps,

>Le vendredi 04 février 2005 à 21:15 -0500, Alex Karasulu a écrit :
>>Alan D. Cabrera wrote:
>>>Why does it reparse the string when it's normalizing?
>>The string is reparsed because normalization is not just a matter of 
>>handing whitespace.  It involves normalizing values so that case and 
>>white space varience do not effect the outcome of addressing the entry 
>>node within the namespace.  Things like the attribute schema determine 
>>how this is going to happen.  However this might not contradict what you 
>>are asking I just don't have enough info from this one liner.
>>I think you are referring to when a non-normalized DN (user provided 
>>input) as an LdapName is converted into a string then put through the 
>>parser again.  It might not have to be if I understand you. 

View raw message