directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lécharny <>
Subject Re: Dn, Rdn and Ava inconstancies
Date Mon, 20 Feb 2012 21:59:04 GMT
Le 2/20/12 7:36 PM, Alex Karasulu a écrit :
> On Mon, Feb 20, 2012 at 6:33 PM, Emmanuel Lécharny<>wrote:
>> Hi guys,
>> those last days, we had to fight with some issues with the way we handle
>> DNs and its components :
>> - creating entries with a RDN containing two times the same AT is not
>> allowed by the spec
>> - searching for an entry which RDN is cn=Doe+gn=John does not work when
>> searching for gn=John+cn=Doe
>> - renaming an entry on itself when we want to upercase a RDN is not
>> possible when it should.
>> Digging a bit into the code, I found that many cases weren't handled
>> correctly, and the the API is not consistant. We also have issues with
>> escaped characters.
>> For instance, if we consider the Ava class, there are some methods that
>> need to be renamed :
>> o getUpName() should be renamed to getName() as Dn.getName() and
>> Rdn.getName() are used
>> o getUpType() should be renamed to getType() to be consistant with the
>> previous rename
>> o getUpValue() should also be renamed to getValue() for the very same
>> reason.
>> Now, when it comes to what the methods produce, here is a table showing
>> the expected values :
>> If the AVA is not schema aware :
>>     getNormName()    "ou=exemple \+ rdn\C3\A4\ "
>>     getNormType()    “ou”
>>     getNormValue()    "exemple + rdnä "
>>     getUpName()    "OU = Exemple \+ Rdn\C3\A4\ "
>>     getUpType()    “OU“
>>     getUpValue()    "Exemple + Rdnä "
>>     normalize()    "ou=exemple \+ rdn\C3\A4\ "
>>     toString()    "OU = Exemple \+ Rdn\C3\A4\ "
>> and if the AVA is schema aware :
>>     getNormName() “ \+ rdn\C3\A4\ ”
>>     getNormType() “”
>>     getNormValue() “exemple + rdnä ”
>>     getUpName() "OU = Exemple \+ Rdn\C3\A4\ "
>>     getUpType() “OU“
>>     getUpValue() "Exemple + Rdnä "
>>     normalize() “ \+ rdn\C3\A4\ ”
>>     toString() "OU = Exemple \+ Rdn\C3\A4\ "
>> Currently, this is not what we get :
>>     Ava.getNormName() returns 'ou=Exemple \\\+ Rdn\\C3\\A4\\\ '
>>     Ava.getUpValue() returns 'Exemple \+ Rdn\C3\A4\ '
>>     Ava.normalize() returns 'ou=Exemple \\\+ Rdn\\C3\\A4\\\ '
>> The normalize() method seems useless.
>> For RDN, we also have some method renaming to anticipate :
>> o getUpType() should be renamed getType()
>> o getUpValue() should be renamed getValue()
>> o getValue(String) should be removed, we can grab the value using the
>> getAva( String ) instead
>> Same, for the expected values and the values we get :
>>     getName()    "OU = Exemple \+ Rdn\C3\A4\ +cn=  TEST"
>>     getNormName()    "ou=exemple \+ rdn\C3\A4\ "
>>     getNormType()    "ou"
>>     getNormValue()    "exemple + rdnä "
>>     getUpType()    “OU“
>>     getUpValue()    "Exemple \+ Rdn\C3\A4\ "
>>     getValue(String)    "Exemple \+ Rdn\C3\A4\ " and “TEST”
>>     toString()    "OU = Exemple \+ Rdn\C3\A4\ +cn=  TEST"
>> and if the RDN is schema aware :
>>     getName()    "OU = Exemple \+ Rdn\C3\A4\ +cn=  TEST"
>>     getNormName()    “ \+ rdn\C3\A4\ ”
>>     getNormType()    ""
>>     getNormValue()    “exemple + rdnä “
>>     getUpType()    “OU“
>>     getUpValue()    "Exemple \+ Rdn\C3\A4\ “
>>     getValue(String)    "Exemple \+ Rdn\C3\A4\ " and “TEST”
>>     toString()    "OU = Exemple \+ Rdn\C3\A4\ +cn=  TEST"
>> This is what we get :
>> Rdn.getNormName() returns 'ou=Exemple \+ Rdnä\ +cn=TEST'
>> Rdn.getNormValue() returns 'Exemple + Rdnä '
>> Rdn.getUpValue() returns ' Exemple \+ Rdn\C3\A4\ '
>> Rdn.getValue( 'ou' ) returns 'Exemple + Rdnä '
>> Rdn.getValue( 'test' ) returns ''
>> Etc...
>> I have not yet coded the tests for the schema aware AVA and RDN, but be
>> sure we will get more inconsistencies. I still have to write down the same
>> analysis for Dn, but this is the same story.
>> We really need to fix those inconsistencies otherwise we will have endless
>> issues. This is not the first time we are dealing with them, bt so far, we
>> never had to face them for real, and we just tried our best to shoot the
>> errors when they appear. I think it's time to play medieval on the code !
> This makes a lot of sense. As things matured in the project we started
> seeing more and more of the corner cases that we need to account for.  As
> you say, we did this incrementally as we encountered various situations.
> Over time this strains how this area of the library was designed. Naturally
> you cannot account for everything and over time various choices see
> obsolete as you patch and patch and patch the code.
> Now after seeing so much of the corner cases and how the design may not be
> supporting it cleanly, and efficiently, then sure re-architect it know that
> we have the tests, history and the knowledge.
Thanks a lot, Alex !!!

I have spent *so much* time in the past on those classes that I some 
point I felt a bit depressed that I wasn't able to design it 
correctly... Yes, it was like when a baby start to walk : you can't run 
before being able to walk...

So, the main issue is the way AVA handles values. As soon as we *know* 
what we should expect when we create an AVA, then suddenly it becomes 
way easier. Basically, an AVA contains one type and one value. This 
value can be a String or a byte[], depending on the type. Saddly, if the 
AVA is not schema aware, we can't tell if the value is binary or String.

The AVA are not intended to be used in insolation in a RDN, they are 
also representing a part of the entry (keep in mind that a RDN AVA 
*must* be present in the entry). That means AVAs are dual objects (part 
of the RDN but also part of the Entry).

That being said, the AVA's value is always used internally as an UTF-8 
encoded String if it's a String, or as a byte[] if it's not a String. So 
far, so good, as we store the value in Value<?> objects.

It starts to be a bit complex when we know that we must keep the value 
as it was provided by the user, but the server requires that the value 
is normalized before being able to deal with it (it's not mandatory that 
we keep a normalized form of the value, but it's critical from a 
performance POV).

That means we will keep two instances of the value : the user provided 
one and the normalized one.

Wait ! The Value<?> class already does that for us ! Great... So we are 
safe, aren't we ?

Not even close... The tricky part is that we must use AVAs in a RDN, and 
RDN are often exposed as Strings, where some characters must be escaped. 
For instance, '+', spaces at the end, any non ASCII chars, etc need to 
be escaped.

That implies we need a escape( String ) method to do this work when we 
want to produce a String from an AVA instance. This method already 
exists, thanks god !

Now, come the Schame into the game. A Schema aware AVA will be able to 
determinate if the value is binary or not, and will modify the interned 
value by applying a normalization method (accordingly to the 
AttributeType). This will impact the String representation of the AVA 
when used in a RDN (not in the entry), when the user wants to manipulate 
the RDN as a String.

Basically, we will have two forms for an AVA :
- a User Provided form (the standard form)
- a Normalized form which will differ depending on the fact that the AVA 
is schema aware, or not.

Here is a small table representing the different forms for each of the 
methods exposing an AVA content :

Considering that we want to create the following AVA :
"OU" with the value "Exemple + Rdn\u00E4 "

(note that the value has a space at the end, contains a '+' and a 'ä' 
character - which is representing using the \u00E4 unicode sequence in a 
Java string)

1) The Ava.getNormName() method will return :
  - No Schema    : "ou=Exemple \\+ Rdn\\C3\\A4\\ "
  - With Schema : " \\+ rdn\\C3\\A4"

As we can see, the getNormName() return a String so the value must be 
escaped. As we dont know what kind of normalizer to use when we don't 
have a schema, in the first case we still have 'ou' instead of 
'', and the upper case chars remain upper cased. The 'ä' 
character is translated to its UTF-8 encoding and escaped counterpart, 
ie C3 A4. Last, not least, the end space is kept when we don't have a 
schema, but removed due to the normalization when we have a schema.

2) The Ava.getNormType() returns "ou" if we don't have a schema and 
"" with a schema

3) The Ava.getNormValue().getString() will return :
  - No Schema   : "Exemple + Rdnä "
  - With Schema : "exemple + rdnä"

Same as (1), the normalization has removed the ending space and lower 
cased everything. The big difference here is that we don't escape any 
character, as we are dealing with the Value itself, not with an element 
of the RDN. The escaped characters are only useful when used in a DN or 
a RDN, when the AVA is seen as a combinaison of an AttributeType and a 

4) The Ava.getUpName() will return "OU=Exemple \\+ Rdn\\C3\\A4\\ " in 
both case. It's the original value, escaped (because it's a String, 
likely to be used as a part of a DN or RDN).

5) The Ava.getUpType() returns "OU" in both cases

6) The Ava.getUpValue().toString() returns "Exemple + Rdnä " in both 
cases. The ending space is kept, as it's the way the user typed it in. 
No escaped chars, it's a Value, not an AVA.

7) The Ava.normalize() returns :
  - No Schema   : "ou=Exemple \\+ Rdn\\C3\\A4\\ "
  - With Schema : " \\+ rdn\\C3\\A4"
  Here again, as the AVA has been normalized, the ending space has been 
removed in the second case. The attribute has been replaced by its OID, 
and when we have no schema, it has been lower cased. All the special 
schars are escaped.

8) Last, not least, the toString() method returns the original value, 
escaped : "OU=Exemple \\+ Rdn\\C3\\A4\\ "

Note : we could decide that the toString() method should return the 
value the user provided, instead of escaping characters. That would make 
a lot of sense, IMO.

AFAICT, shared is building fine with the changes I applied to get the 
results I exposed.

I still have to move forward to get RDN and DN be consistent.

Emmanuel Lécharny

View raw message