directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lecharny <>
Subject String preparations
Date Mon, 25 Dec 2006 21:15:33 GMT
Hi guys !

I'm currently working on the implementation of RFC 4518, which says that 
to be able to apply MatchingRules on String values, we should transfomr 

This transformation is a 6 steps process, pretty boring, and somwhere in 
the middle, there is a Normalization steps, where characters may be 
transformed to multi-characters like : "Schön" will be transformed to 
"Scho\u0308n" (the ö is transformed to a simple 'o' plus a code) (not 
that this is *not* a good exemple, because the transformation we must 
implement is different. It's NFKC transformation (for those who have 
_nothing_ else to do, or who had an argument with boyfriend/girlfriend 
and has a lot of time to waste, waiting he/she cools down, here is the 
doco :

Ok, now, the point is : in Java 5, there is nothing in the API to do 
this normalizer (Java 6 has it !), but as we won't switch to java 6, it 
lefts us with few options :
1) why the hell do we need to take care of those bloody countries with 
bloody letters - hieroglyph, or whatever I can't read - that exceed the 
Beauty of US-ASCII ???
2) damn, I'm french/german/turk/... (ISO-3166, pick your country) and my 
name does not make it with US-ASCII (like Szörner, or Lécharny :). I 
have to do some normalization...
2-a) Let's wait for Java 6... We are not in a hurry, the current code 
covers 99,9999999% of all the cases.
2-b) Let's use apache-abdera Unicode impl, it seems pretty complete
2-c) I feel like implementing this Normalizer myself, because I LOVE 
Unicode ! (I know all of  the 1 156 345 characters, and I can draw them 
knowing only their values... Actually, I also do crack, and I am a 
speaker at each Unicoke conference ...)

Ok, ok, I think that 2-b make the trick, from my point of view. wdyt ?

Emmanuel L\u00e9charny

Oh, great idea if you forgot to send a gift to your mother-in-law, the 
last Unicode spec version, only 1450 pages !  :

View raw message