directory-api mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel L├ęcharny <>
Subject Re: DN parsing troubles...
Date Thu, 08 Oct 2015 18:02:04 GMT
Ok, moving forward...

Let's first give you some insight about the full problem.

A Dn is a list of Rdn, a Rdn is a list of Ava, and an Ava is an
attribute, the '=' sign and a value. When we parse a DN, we have to
create a Dn instance which contains all those elements. A DN
representation is ruled by the RFC 4514, which describe what to do when
some special characters are present (escaping them).

Bottom line, after having parsed the DN; we have a representation of it
in the Dn instance, and we have two ways to get its value : as it was
provided by the user (User Provided value, or UPValue) and as a
normilized value (trimmed, escaped chars converted to their counterpart,

This is all good, but we also have one more step : apply the schema,
which will transform each attribute and value to get a normalized form
which is dictated by the AttributeType (typically, we may lowercase teh
chars, remove duplicate spaces, etc...). And this is where we are in
trouble !

For instance, consider this RDN : "cn = Test\ ". This is a RDN in which
the valid value is 'test ', once normalized - weel, this is what we
should get -. Here, the last space is not removed because it is escaped
in the RDN, so the resulting normalized value must be kept. But this is
not what happen :/

Let's see how the parser works :
- first, we parse the RDn. We obtain this data structure :

    Rdn :
      upName : 'cn= TEST\ '
      normName : 'cn=TEST\ ' (the useless spaces has been removed)
      Ava :
        upName : 'cn= TEST\ '
        upType : 'cn'
        normType : 'cn'
        Value :
            wrappedValue : 'TEST\ '
            normalizedValue : 'TEST ' (the escaped space has been
replace by a plain space)

Everything is correct so far.

- now, we apply the SchemaManager on this Rdn instance, which is
propagated to the Ava and the Value. This is done by calling the
Dn.rdnOidToName(Rdn SchemaManager) method, which itself call the
atavOidToName( Ava, SchemaManager) method. This method will transform
teh Attribute into the associated OID found in the AttributeType
registry ( for 'cn'), and then will try to normalize the Value
with :

    atavValue = new StringValue( attributeType, value.getString() );
value.getString() returns the wrappedValue (ie, 'TEST\ '), and the
StringValue class will apply the AttributeType to this value :

    public StringValue( AttributeType attributeType, String value )
throws LdapInvalidAttributeValueException
        this( value, value );
        apply( attributeType );

bottom line, it calls :

                            normalizedValue = ( T )
normalizer.normalize( ( String ) wrappedValue );

The Normalizer we use here is the CachingDeepTrimToLowerNormalizer
class, which is just (atm) a wrapper around the
DeepTrimToLowerNormalizer class (ie, we don't keep a cache atm). Here is
what this normalizer does :

     * Normalizer which trims down whitespace replacing multiple whitespace
     * characters on the edges and within the string with a single space
     * thereby preserving tokenization order - while doing all this in
the same pass
     * it lower cases all characters.

this normalizer is calling this method :

    PrepareString.normalize( value.getString(),
PrepareString.StringType.CASE_IGNORE );

which will remove spurious spaces around the value by calling the
insignifiantSpacesStringAscii( str, IGNORE_CASE ) method, which will
remove the escaped space :/

At this point, we are screwed because we have trandformed the value
unduly, because the insignifiantSpacesStringAscii method does not check
that the space is escaped...

Bottom line, we have something that works when teh RDN is not schema
aware, and I have fixed the Value, the Dn complex parser and the tests.
I now have to fix teh prepare string method. That should be done tomorrow...

Note that the change in Value will change the way we serialize the DN,
hopefully for teh better, as we do'nt store 2 values per AVA, but only
one, saving some space.

View raw message