directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Karasulu <akaras...@apache.org>
Subject Re: String preparations
Date Thu, 28 Dec 2006 13:47:46 GMT
Emmanuel Lecharny wrote:
> Hi guys !
> 
> I'm currently working on the implementation of RFC 4518, which says that
> to be able to apply MatchingRules on String values, we should transfomr
> them
> ('Prepare').

:) thanks for picking up this task E.

Just a note of caution.  Remember that you must *ONLY* apply string prep
if you are comparing with non-normalized values.

Normalization must apply string prep to values to produce the canonical
representation.  Also as you know values within indices are normalized
and hence already have string prep (pseudo string prep as I implemented
it) applied.  So you need not apply string prep on indexed attribute
values.  String prep must be applied when comparing values directly from
the entry pulled out of the master table when indices are not available.

> This transformation is a 6 steps process, pretty boring, and somwhere in
> the middle, there is a Normalization steps, where characters may be
> transformed to multi-characters like : "Schön" will be transformed to
> "Scho\u0308n" (the ö is transformed to a simple 'o' plus a code) 

Yes this is an additional way to normalize IMO.

(not
> that this is *not* a good exemple, because the transformation we must
> implement is different. It's NFKC transformation (for those who have
> _nothing_ else to do, or who had an argument with boyfriend/girlfriend
> and has a lot of time to waste, waiting he/she cools down, here is the
> doco :
> http://www.unicode.org/unicode/reports/tr15/tr15-22.html#Specification)
> 
> Ok, now, the point is : in Java 5, there is nothing in the API to do
> this normalizer (Java 6 has it !), but as we won't switch to java 6, it
> lefts us with few options :
> 1) why the hell do we need to take care of those bloody countries with
> bloody letters - hieroglyph, or whatever I can't read - that exceed the
> Beauty of US-ASCII ???

I don't think we do.  Will be nice when we do though on switching to J6.

> 2) damn, I'm french/german/turk/... (ISO-3166, pick your country) and my
> name does not make it with US-ASCII (like Szörner, or Lécharny :). I
> have to do some normalization...

OK.

> 2-a) Let's wait for Java 6... We are not in a hurry, the current code
> covers 99,9999999% of all the cases.

NP I'm fine with that.

> 2-b) Let's use apache-abdera Unicode impl, it seems pretty complete

That's an option.
...

> Ok, ok, I think that 2-b make the trick, from my point of view. wdyt ?

2-b seems nice but I'm fine with 2-a too.  Right now we have bigger fish
to fry than making ADS work with UNICODE based languages.  Sorry but
other LDAP servers have taken the same approach.

Alex



Mime
View raw message