directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Emmanuel Lecharny" <elecha...@gmail.com>
Subject Re: String preparations
Date Thu, 28 Dec 2006 14:06:21 GMT
On 12/28/06, Alex Karasulu <akarasulu@apache.org> wrote:
>
> Emmanuel Lecharny wrote:
> > Hi guys !
> >
> > I'm currently working on the implementation of RFC 4518, which says that
> > to be able to apply MatchingRules on String values, we should transfomr
> > them
> > ('Prepare').
>
> :) thanks for picking up this task E.
>
> Just a note of caution.  Remember that you must *ONLY* apply string prep
> if you are comparing with non-normalized values.


String prep is for assertion (incoming values) or for non-indexed values.
And the only kind of Strings that should be 'prepared' are those which types
is DirectoryString (PrintableString is a subset, so are TelephoneNumber, and
teletexString...). String Values which will be stored in an index have to be
prepared too. The normalization process is just the application of this
preparation. In some way, Normalizers  in ADS = Stringpreparation minus a
lot of tricky unicode manipulations (like mapping, bidi handling, etc). Of
course, some cases must be handled, like Lowercasing or not, etc...

Normalization must apply string prep to values to produce the canonical
> representation.  Also as you know values within indices are normalized
> and hence already have string prep (pseudo string prep as I implemented
> it) applied.  So you need not apply string prep on indexed attribute
> values.  String prep must be applied when comparing values directly from
> the entry pulled out of the master table when indices are not available.


StringPrep must be applied to :
- assertion values
- attributes values which are not indexed.


> This transformation is a 6 steps process, pretty boring, and somwhere in
> > the middle, there is a Normalization steps, where characters may be
> > transformed to multi-characters like : "Schön" will be transformed to
> > "Scho\u0308n" (the ö is transformed to a simple 'o' plus a code)
>
> Yes this is an additional way to normalize IMO.


This is RFC 4518, which is a little bit too advanced for current LDAP
servers :)


> OK.
>
> > 2-a) Let's wait for Java 6... We are not in a hurry, the current code
> > covers 99,9999999% of all the cases.
>
> NP I'm fine with that.


Yes, I think this is reasonnable...

> Ok, ok, I think that 2-b make the trick, from my point of view. wdyt ?
>
> 2-b seems nice but I'm fine with 2-a too.  Right now we have bigger fish
> to fry than making ADS work with UNICODE based languages.  Sorry but
> other LDAP servers have taken the same approach.


Don't be sorry. Transforming ADS to comply fully with RFC 4518 will be
overkilling. Let's do that in 2.0 (or maybe in 3.0 :)

Emmanuel

Mime
View raw message