directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Karasulu <akaras...@apache.org>
Subject Re: Dn, Rdn and Ava inconstancies
Date Mon, 20 Feb 2012 22:20:52 GMT
On Mon, Feb 20, 2012 at 11:59 PM, Emmanuel Lécharny <elecharny@gmail.com>wrote:

> Le 2/20/12 7:36 PM, Alex Karasulu a écrit :
>
>  On Mon, Feb 20, 2012 at 6:33 PM, Emmanuel Lécharny<elecharny@gmail.com>**
>> wrote:
>>
>>  Hi guys,
>>>
>>> those last days, we had to fight with some issues with the way we handle
>>> DNs and its components :
>>> - creating entries with a RDN containing two times the same AT is not
>>> allowed by the spec
>>> - searching for an entry which RDN is cn=Doe+gn=John does not work when
>>> searching for gn=John+cn=Doe
>>> - renaming an entry on itself when we want to upercase a RDN is not
>>> possible when it should.
>>>
>>> Digging a bit into the code, I found that many cases weren't handled
>>> correctly, and the the API is not consistant. We also have issues with
>>> escaped characters.
>>>
>>> For instance, if we consider the Ava class, there are some methods that
>>> need to be renamed :
>>> o getUpName() should be renamed to getName() as Dn.getName() and
>>> Rdn.getName() are used
>>> o getUpType() should be renamed to getType() to be consistant with the
>>> previous rename
>>> o getUpValue() should also be renamed to getValue() for the very same
>>> reason.
>>>
>>> Now, when it comes to what the methods produce, here is a table showing
>>> the expected values :
>>>
>>> If the AVA is not schema aware :
>>>
>>>    getNormName()    "ou=exemple \+ rdn\C3\A4\ "
>>>    getNormType()    “ou”
>>>    getNormValue()    "exemple + rdnä "
>>>    getUpName()    "OU = Exemple \+ Rdn\C3\A4\ "
>>>    getUpType()    “OU“
>>>    getUpValue()    "Exemple + Rdnä "
>>>    normalize()    "ou=exemple \+ rdn\C3\A4\ "
>>>    toString()    "OU = Exemple \+ Rdn\C3\A4\ "
>>>
>>> and if the AVA is schema aware :
>>>
>>>    getNormName() “2.5.4.11=example \+ rdn\C3\A4\ ”
>>>    getNormType() “2.4.5.11”
>>>    getNormValue() “exemple + rdnä ”
>>>    getUpName() "OU = Exemple \+ Rdn\C3\A4\ "
>>>    getUpType() “OU“
>>>    getUpValue() "Exemple + Rdnä "
>>>    normalize() “2.5.4.11=example \+ rdn\C3\A4\ ”
>>>    toString() "OU = Exemple \+ Rdn\C3\A4\ "
>>>
>>> Currently, this is not what we get :
>>>
>>>    Ava.getNormName() returns 'ou=Exemple \\\+ Rdn\\C3\\A4\\\ '
>>>    Ava.getUpValue() returns 'Exemple \+ Rdn\C3\A4\ '
>>>    Ava.normalize() returns 'ou=Exemple \\\+ Rdn\\C3\\A4\\\ '
>>>
>>> The normalize() method seems useless.
>>>
>>>
>>> For RDN, we also have some method renaming to anticipate :
>>> o getUpType() should be renamed getType()
>>> o getUpValue() should be renamed getValue()
>>> o getValue(String) should be removed, we can grab the value using the
>>> getAva( String ) instead
>>>
>>> Same, for the expected values and the values we get :
>>>
>>>    getName()    "OU = Exemple \+ Rdn\C3\A4\ +cn=  TEST"
>>>    getNormName()    "ou=exemple \+ rdn\C3\A4\ "
>>>    getNormType()    "ou"
>>>    getNormValue()    "exemple + rdnä "
>>>    getUpType()    “OU“
>>>    getUpValue()    "Exemple \+ Rdn\C3\A4\ "
>>>    getValue(String)    "Exemple \+ Rdn\C3\A4\ " and “TEST”
>>>    toString()    "OU = Exemple \+ Rdn\C3\A4\ +cn=  TEST"
>>>
>>> and if the RDN is schema aware :
>>>
>>>    getName()    "OU = Exemple \+ Rdn\C3\A4\ +cn=  TEST"
>>>    getNormName()    “2.5.4.11=example \+ rdn\C3\A4\ ”
>>>    getNormType()    "2.5.4.3"
>>>    getNormValue()    “exemple + rdnä “
>>>    getUpType()    “OU“
>>>    getUpValue()    "Exemple \+ Rdn\C3\A4\ “
>>>    getValue(String)    "Exemple \+ Rdn\C3\A4\ " and “TEST”
>>>    toString()    "OU = Exemple \+ Rdn\C3\A4\ +cn=  TEST"
>>>
>>> This is what we get :
>>>
>>> Rdn.getNormName() returns 'ou=Exemple \+ Rdnä\ +cn=TEST'
>>> Rdn.getNormValue() returns 'Exemple + Rdnä '
>>> Rdn.getUpValue() returns ' Exemple \+ Rdn\C3\A4\ '
>>> Rdn.getValue( 'ou' ) returns 'Exemple + Rdnä '
>>> Rdn.getValue( 'test' ) returns ''
>>>
>>> Etc...
>>>
>>> I have not yet coded the tests for the schema aware AVA and RDN, but be
>>> sure we will get more inconsistencies. I still have to write down the
>>> same
>>> analysis for Dn, but this is the same story.
>>>
>>>
>>> We really need to fix those inconsistencies otherwise we will have
>>> endless
>>> issues. This is not the first time we are dealing with them, bt so far,
>>> we
>>> never had to face them for real, and we just tried our best to shoot the
>>> errors when they appear. I think it's time to play medieval on the code !
>>>
>>>  This makes a lot of sense. As things matured in the project we started
>> seeing more and more of the corner cases that we need to account for.  As
>> you say, we did this incrementally as we encountered various situations.
>>
>> Over time this strains how this area of the library was designed.
>> Naturally
>> you cannot account for everything and over time various choices see
>> obsolete as you patch and patch and patch the code.
>>
>> Now after seeing so much of the corner cases and how the design may not be
>> supporting it cleanly, and efficiently, then sure re-architect it know
>> that
>> we have the tests, history and the knowledge.
>>
> Thanks a lot, Alex !!!
>
>
Really man I did nothing :).


> I have spent *so much* time in the past on those classes that I some point
> I felt a bit depressed that I wasn't able to design it correctly... Yes, it
> was like when a baby start to walk : you can't run before being able to
> walk...
>
>
Yep, this is the reason why we need to release early release often. Speed
up the iterations of feedback that leads to us stretching a design then
step back and refactor. It's naturally agile if we stick to the OS mantra.


> So, the main issue is the way AVA handles values. As soon as we *know*
> what we should expect when we create an AVA, then suddenly it becomes way
> easier. Basically, an AVA contains one type and one value. This value can
> be a String or a byte[], depending on the type. Saddly, if the AVA is not
> schema aware, we can't tell if the value is binary or String.
>
>
Sounds like it needs not to be schema aware but binary attribute aware: a
subset of the schema. This is the first level of correctness.

The next level depends on whether or not we have the full schema available
to properly normalize the value.


> The AVA are not intended to be used in insolation in a RDN, they are also
> representing a part of the entry (keep in mind that a RDN AVA *must* be
> present in the entry). That means AVAs are dual objects (part of the RDN
> but also part of the Entry).
>
> That being said, the AVA's value is always used internally as an UTF-8
> encoded String if it's a String, or as a byte[] if it's not a String. So
> far, so good, as we store the value in Value<?> objects.
>
>
+1


> It starts to be a bit complex when we know that we must keep the value as
> it was provided by the user, but the server requires that the value is
> normalized before being able to deal with it (it's not mandatory that we
> keep a normalized form of the value, but it's critical from a performance
> POV).
>
> That means we will keep two instances of the value : the user provided one
> and the normalized one.
>
> Wait ! The Value<?> class already does that for us ! Great... So we are
> safe, aren't we ?
>
> Not even close... The tricky part is that we must use AVAs in a RDN, and
> RDN are often exposed as Strings, where some characters must be escaped.
> For instance, '+', spaces at the end, any non ASCII chars, etc need to be
> escaped.
>
> That implies we need a escape( String ) method to do this work when we
> want to produce a String from an AVA instance. This method already exists,
> thanks god !
>
>
Yeah it's a pretty gnarly function.


> Now, come the Schame into the game. A Schema aware AVA will be able to
> determinate if the value is binary or not, and will modify the interned
> value by applying a normalization method (accordingly to the
> AttributeType). This will impact the String representation of the AVA when
> used in a RDN (not in the entry), when the user wants to manipulate the RDN
> as a String.
>
> Basically, we will have two forms for an AVA :
> - a User Provided form (the standard form)
> - a Normalized form which will differ depending on the fact that the AVA
> is schema aware, or not.
>
>
Note that whether schema aware or UN-aware AVAs will still need to be
binary type aware.


> Here is a small table representing the different forms for each of the
> methods exposing an AVA content :
>
> Considering that we want to create the following AVA :
> "OU" with the value "Exemple + Rdn\u00E4 "
>
> (note that the value has a space at the end, contains a '+' and a 'ä'
> character - which is representing using the \u00E4 unicode sequence in a
> Java string)
>
> 1) The Ava.getNormName() method will return :
>  - No Schema    : "ou=Exemple \\+ Rdn\\C3\\A4\\ "
>  - With Schema : "2.5.4.11=exemple \\+ rdn\\C3\\A4"
>
> As we can see, the getNormName() return a String so the value must be
> escaped. As we dont know what kind of normalizer to use when we don't have
> a schema, in the first case we still have 'ou' instead of '2.5.4.11', and
> the upper case chars remain upper cased. The 'ä' character is translated to
> its UTF-8 encoding and escaped counterpart, ie C3 A4. Last, not least, the
> end space is kept when we don't have a schema, but removed due to the
> normalization when we have a schema.
>
>
Is it worth removing the white space variance which we can do with or
without a schema? You don't need schema to do this right? I'm thinking it
may under certain situations prevent some problems due to case variance on
clients not loading a schema.


> 2) The Ava.getNormType() returns "ou" if we don't have a schema and
> "2.5.4.11" with a schema
>
> 3) The Ava.getNormValue().getString() will return :
>  - No Schema   : "Exemple + Rdnä "
>  - With Schema : "exemple + rdnä"
>
> Same as (1), the normalization has removed the ending space and lower
> cased everything. The big difference here is that we don't escape any
> character, as we are dealing with the Value itself, not with an element of
> the RDN. The escaped characters are only useful when used in a DN or a RDN,
> when the AVA is seen as a combinaison of an AttributeType and a Value.
>
> 4) The Ava.getUpName() will return "OU=Exemple \\+ Rdn\\C3\\A4\\ " in both
> case. It's the original value, escaped (because it's a String, likely to be
> used as a part of a DN or RDN).
>
> 5) The Ava.getUpType() returns "OU" in both cases
>
> 6) The Ava.getUpValue().toString() returns "Exemple + Rdnä " in both
> cases. The ending space is kept, as it's the way the user typed it in. No
> escaped chars, it's a Value, not an AVA.
>
> 7) The Ava.normalize() returns :
>  - No Schema   : "ou=Exemple \\+ Rdn\\C3\\A4\\ "
>  - With Schema : "2.5.4.11=exemple \\+ rdn\\C3\\A4"
>  Here again, as the AVA has been normalized, the ending space has been
> removed in the second case. The attribute has been replaced by its OID, and
> when we have no schema, it has been lower cased. All the special schars are
> escaped.
>
> 8) Last, not least, the toString() method returns the original value,
> escaped : "OU=Exemple \\+ Rdn\\C3\\A4\\ "
>
> Note : we could decide that the toString() method should return the value
> the user provided, instead of escaping characters. That would make a lot of
> sense, IMO.
>
>
+1


>
> AFAICT, shared is building fine with the changes I applied to get the
> results I exposed.
>
> I still have to move forward to get RDN and DN be consistent.
>
>
That was good summary of approaches and outcomes. I'm sure you'll knock
this out easily, we have most of the pieces needed. Most of what needs to
be done is just a refactoring.

-- 
Best Regards,
-- Alex

Mime
View raw message