jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grégory Joseph <gregory.jos...@magnolia-cms.com>
Subject Unicode, NFC,NFD and node names
Date Wed, 04 Nov 2009 17:11:10 GMT
Hi list,

Given the following code,
import java.text.Normalizer;
...

         final Session session = ...

         final Repository rep = session.getRepository();
         System.out.println(rep.getDescriptor("jcr.repository.name") +  
" " + rep.getDescriptor("jcr.repository.version"));

         final Node root = session.getRootNode();
         final String name = "föö";
         System.out.println("Normalizer.isNormalized(name,  
Normalizer.Form.NFC) = " + Normalizer.isNormalized(name,  
Normalizer.Form.NFC)); // true
         System.out.println("Normalizer.isNormalized(name,  
Normalizer.Form.NFD) = " + Normalizer.isNormalized(name,  
Normalizer.Form.NFD)); // false
         root.addNode(name);
         session.save();

         final Node node1 = root.getNode(name);
         System.out.println("node1 = " + node1);
         final Node node2 = root.getNode(Normalizer.normalize(name,  
Normalizer.Form.NFC));
         System.out.println("node2 = " + node2);
         final Node node3 = root.getNode(Normalizer.normalize(name,  
Normalizer.Form.NFD)); // fails
         System.out.println("node3 = " + node3);

There's a good chance fetching node3 won't work. It might be dependent  
on the underlying os and database, but in the case of OSX and Derby,  
this fails. It's not that surprising, really, given that  
Normalizer.normalize(name,  
Normalizer.Form.NFC).equals(Normalizer.normalize(name,  
Normalizer.Form.NFD)) is NOT true.

Now, taking into account the fact that all sorts of clients will use a  
different Normalizing Form (Firefox seems to encode URL parameters  
with NFD, Safari with NFC; linux NFC, OSX finder seems to favor NFD),  
wouldn't it be a safe bet to normalize all input at repository level ?  
Or do you consider this is something client applications should do ?

ref: http://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms

Thanks for any tip, pointer, idea, feedback or reaction !

Cheers,

-greg



Mime
View raw message