On Mon, Feb 20, 2012 at 11:59 PM, Emmanuel Lécharny <elecharny@gmail.com> wrote:
Le 2/20/12 7:36 PM, Alex Karasulu a écrit :

On Mon, Feb 20, 2012 at 6:33 PM, Emmanuel Lécharny<elecharny@gmail.com>wrote:

Hi guys,

those last days, we had to fight with some issues with the way we handle
DNs and its components :
- creating entries with a RDN containing two times the same AT is not
allowed by the spec
- searching for an entry which RDN is cn=Doe+gn=John does not work when
searching for gn=John+cn=Doe
- renaming an entry on itself when we want to upercase a RDN is not
possible when it should.

Digging a bit into the code, I found that many cases weren't handled
correctly, and the the API is not consistant. We also have issues with
escaped characters.

For instance, if we consider the Ava class, there are some methods that
need to be renamed :
o getUpName() should be renamed to getName() as Dn.getName() and
Rdn.getName() are used
o getUpType() should be renamed to getType() to be consistant with the
previous rename
o getUpValue() should also be renamed to getValue() for the very same

Now, when it comes to what the methods produce, here is a table showing
the expected values :

If the AVA is not schema aware :

   getNormName()    "ou=exemple \+ rdn\C3\A4\ "
   getNormType()    “ou”
   getNormValue()    "exemple + rdnä "
   getUpName()    "OU = Exemple \+ Rdn\C3\A4\ "
   getUpType()    “OU“
   getUpValue()    "Exemple + Rdnä "
   normalize()    "ou=exemple \+ rdn\C3\A4\ "
   toString()    "OU = Exemple \+ Rdn\C3\A4\ "

and if the AVA is schema aware :

   getNormName() “ \+ rdn\C3\A4\ ”
   getNormType() “”
   getNormValue() “exemple + rdnä ”
   getUpName() "OU = Exemple \+ Rdn\C3\A4\ "
   getUpType() “OU“
   getUpValue() "Exemple + Rdnä "
   normalize() “ \+ rdn\C3\A4\ ”
   toString() "OU = Exemple \+ Rdn\C3\A4\ "

Currently, this is not what we get :

   Ava.getNormName() returns 'ou=Exemple \\\+ Rdn\\C3\\A4\\\ '
   Ava.getUpValue() returns 'Exemple \+ Rdn\C3\A4\ '
   Ava.normalize() returns 'ou=Exemple \\\+ Rdn\\C3\\A4\\\ '

The normalize() method seems useless.

For RDN, we also have some method renaming to anticipate :
o getUpType() should be renamed getType()
o getUpValue() should be renamed getValue()
o getValue(String) should be removed, we can grab the value using the
getAva( String ) instead

Same, for the expected values and the values we get :

   getName()    "OU = Exemple \+ Rdn\C3\A4\ +cn=  TEST"
   getNormName()    "ou=exemple \+ rdn\C3\A4\ "
   getNormType()    "ou"
   getNormValue()    "exemple + rdnä "
   getUpType()    “OU“
   getUpValue()    "Exemple \+ Rdn\C3\A4\ "
   getValue(String)    "Exemple \+ Rdn\C3\A4\ " and “TEST”
   toString()    "OU = Exemple \+ Rdn\C3\A4\ +cn=  TEST"

and if the RDN is schema aware :

   getName()    "OU = Exemple \+ Rdn\C3\A4\ +cn=  TEST"
   getNormName()    “ \+ rdn\C3\A4\ ”
   getNormType()    ""
   getNormValue()    “exemple + rdnä “
   getUpType()    “OU“
   getUpValue()    "Exemple \+ Rdn\C3\A4\ “
   getValue(String)    "Exemple \+ Rdn\C3\A4\ " and “TEST”
   toString()    "OU = Exemple \+ Rdn\C3\A4\ +cn=  TEST"

This is what we get :

Rdn.getNormName() returns 'ou=Exemple \+ Rdnä\ +cn=TEST'
Rdn.getNormValue() returns 'Exemple + Rdnä '
Rdn.getUpValue() returns ' Exemple \+ Rdn\C3\A4\ '
Rdn.getValue( 'ou' ) returns 'Exemple + Rdnä '
Rdn.getValue( 'test' ) returns ''


I have not yet coded the tests for the schema aware AVA and RDN, but be
sure we will get more inconsistencies. I still have to write down the same
analysis for Dn, but this is the same story.

We really need to fix those inconsistencies otherwise we will have endless
issues. This is not the first time we are dealing with them, bt so far, we
never had to face them for real, and we just tried our best to shoot the
errors when they appear. I think it's time to play medieval on the code !

This makes a lot of sense. As things matured in the project we started
seeing more and more of the corner cases that we need to account for.  As
you say, we did this incrementally as we encountered various situations.

Over time this strains how this area of the library was designed. Naturally
you cannot account for everything and over time various choices see
obsolete as you patch and patch and patch the code.

Now after seeing so much of the corner cases and how the design may not be
supporting it cleanly, and efficiently, then sure re-architect it know that
we have the tests, history and the knowledge.
Thanks a lot, Alex !!!

Really man I did nothing :).
I have spent *so much* time in the past on those classes that I some point I felt a bit depressed that I wasn't able to design it correctly... Yes, it was like when a baby start to walk : you can't run before being able to walk...

Yep, this is the reason why we need to release early release often. Speed up the iterations of feedback that leads to us stretching a design then step back and refactor. It's naturally agile if we stick to the OS mantra.
So, the main issue is the way AVA handles values. As soon as we *know* what we should expect when we create an AVA, then suddenly it becomes way easier. Basically, an AVA contains one type and one value. This value can be a String or a byte[], depending on the type. Saddly, if the AVA is not schema aware, we can't tell if the value is binary or String.

Sounds like it needs not to be schema aware but binary attribute aware: a subset of the schema. This is the first level of correctness.

The next level depends on whether or not we have the full schema available to properly normalize the value.
The AVA are not intended to be used in insolation in a RDN, they are also representing a part of the entry (keep in mind that a RDN AVA *must* be present in the entry). That means AVAs are dual objects (part of the RDN but also part of the Entry).

That being said, the AVA's value is always used internally as an UTF-8 encoded String if it's a String, or as a byte[] if it's not a String. So far, so good, as we store the value in Value<?> objects.

It starts to be a bit complex when we know that we must keep the value as it was provided by the user, but the server requires that the value is normalized before being able to deal with it (it's not mandatory that we keep a normalized form of the value, but it's critical from a performance POV).

That means we will keep two instances of the value : the user provided one and the normalized one.

Wait ! The Value<?> class already does that for us ! Great... So we are safe, aren't we ?

Not even close... The tricky part is that we must use AVAs in a RDN, and RDN are often exposed as Strings, where some characters must be escaped. For instance, '+', spaces at the end, any non ASCII chars, etc need to be escaped.

That implies we need a escape( String ) method to do this work when we want to produce a String from an AVA instance. This method already exists, thanks god !

Yeah it's a pretty gnarly function. 
Now, come the Schame into the game. A Schema aware AVA will be able to determinate if the value is binary or not, and will modify the interned value by applying a normalization method (accordingly to the AttributeType). This will impact the String representation of the AVA when used in a RDN (not in the entry), when the user wants to manipulate the RDN as a String.

Basically, we will have two forms for an AVA :
- a User Provided form (the standard form)
- a Normalized form which will differ depending on the fact that the AVA is schema aware, or not.

Note that whether schema aware or UN-aware AVAs will still need to be binary type aware.
Here is a small table representing the different forms for each of the methods exposing an AVA content :

Considering that we want to create the following AVA :
"OU" with the value "Exemple + Rdn\u00E4 "

(note that the value has a space at the end, contains a '+' and a 'ä' character - which is representing using the \u00E4 unicode sequence in a Java string)

1) The Ava.getNormName() method will return :
 - No Schema    : "ou=Exemple \\+ Rdn\\C3\\A4\\ "
 - With Schema : " \\+ rdn\\C3\\A4"

As we can see, the getNormName() return a String so the value must be escaped. As we dont know what kind of normalizer to use when we don't have a schema, in the first case we still have 'ou' instead of '', and the upper case chars remain upper cased. The 'ä' character is translated to its UTF-8 encoding and escaped counterpart, ie C3 A4. Last, not least, the end space is kept when we don't have a schema, but removed due to the normalization when we have a schema.

Is it worth removing the white space variance which we can do with or without a schema? You don't need schema to do this right? I'm thinking it may under certain situations prevent some problems due to case variance on clients not loading a schema.
2) The Ava.getNormType() returns "ou" if we don't have a schema and "" with a schema

3) The Ava.getNormValue().getString() will return :
 - No Schema   : "Exemple + Rdnä "
 - With Schema : "exemple + rdnä"

Same as (1), the normalization has removed the ending space and lower cased everything. The big difference here is that we don't escape any character, as we are dealing with the Value itself, not with an element of the RDN. The escaped characters are only useful when used in a DN or a RDN, when the AVA is seen as a combinaison of an AttributeType and a Value.

4) The Ava.getUpName() will return "OU=Exemple \\+ Rdn\\C3\\A4\\ " in both case. It's the original value, escaped (because it's a String, likely to be used as a part of a DN or RDN).

5) The Ava.getUpType() returns "OU" in both cases

6) The Ava.getUpValue().toString() returns "Exemple + Rdnä " in both cases. The ending space is kept, as it's the way the user typed it in. No escaped chars, it's a Value, not an AVA.

7) The Ava.normalize() returns :
 - No Schema   : "ou=Exemple \\+ Rdn\\C3\\A4\\ "
 - With Schema : " \\+ rdn\\C3\\A4"
 Here again, as the AVA has been normalized, the ending space has been removed in the second case. The attribute has been replaced by its OID, and when we have no schema, it has been lower cased. All the special schars are escaped.

8) Last, not least, the toString() method returns the original value, escaped : "OU=Exemple \\+ Rdn\\C3\\A4\\ "

Note : we could decide that the toString() method should return the value the user provided, instead of escaping characters. That would make a lot of sense, IMO.


AFAICT, shared is building fine with the changes I applied to get the results I exposed.

I still have to move forward to get RDN and DN be consistent.

That was good summary of approaches and outcomes. I'm sure you'll knock this out easily, we have most of the pieces needed. Most of what needs to be done is just a refactoring.

Best Regards,
-- Alex