jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grégory Joseph <gregory.jos...@magnolia-cms.com>
Subject Re: Unicode, NFC,NFD and node names
Date Thu, 05 Nov 2009 14:27:05 GMT
Hi Toby,

On Nov 5, 2009, at 12:26 AM, Tobias Bocanegra wrote:

> hi,
> i don't think this should be the job of the repository to do
> normalization of the paths. likewise a good filesystem (a case
> sensitive one :-) does no normalization of it's paths neither.

Since I wrote this yesterday in quite a rush, let me just stress the  
fact that I'm only talking about unicode normalization forms; a  
filesystem won't have to bother about that, since it doesn't have a  
whole slew of clients who decide to use one form or the other for no  
apparent reason. For "fun", you might want to see this: http://www.mail-archive.com/bug-bash@gnu.org/msg05818.html

I can see why one would want to make a differentiation between the 2  
forms in *values*; in item names, not so much.

Thoughts ?

-g

> 2009/11/4 Grégory Joseph <gregory.joseph@magnolia-cms.com>:
>> fwiw, the following solves the simple problem shown by my previous  
>> example:
>>
>>    private Session wrap(final SessionImpl origSession) throws
>> RepositoryException {
>>        final WorkspaceImpl workspace = (WorkspaceImpl)
>> origSession.getWorkspace();
>>        final RepositoryImpl rep = (RepositoryImpl)
>> origSession.getRepository();
>>        return new SessionImpl(rep, origSession.getSubject(),
>> workspace.getConfig()) {
>>            public Path getQPath(String path) throws  
>> MalformedPathException,
>> IllegalNameException, NamespaceException {
>>                // this is the only relevant part:
>>                return super.getQPath(Normalizer.normalize(path,
>> Normalizer.Form.NFC));
>>            }
>>        };
>>    }
>>
>> If there was a way to swap the session implementation or the
>> Name-and/or-PathResolver implementations that are used by default,  
>> I might
>> give this a spin.
>>
>> Any opinions about the whole problem?
>>
>> Cheers,
>>
>> -g
>>
>> On Nov 4, 2009, at 6:11 PM, Grégory Joseph wrote:
>>
>>> Hi list,
>>>
>>> Given the following code,
>>> import java.text.Normalizer;
>>> ...
>>>
>>>       final Session session = ...
>>>
>>>       final Repository rep = session.getRepository();
>>>       System.out.println(rep.getDescriptor("jcr.repository.name")  
>>> + " " +
>>> rep.getDescriptor("jcr.repository.version"));
>>>
>>>       final Node root = session.getRootNode();
>>>       final String name = "föö";
>>>       System.out.println("Normalizer.isNormalized(name,
>>> Normalizer.Form.NFC) = " + Normalizer.isNormalized(name,
>>> Normalizer.Form.NFC)); // true
>>>       System.out.println("Normalizer.isNormalized(name,
>>> Normalizer.Form.NFD) = " + Normalizer.isNormalized(name,
>>> Normalizer.Form.NFD)); // false
>>>       root.addNode(name);
>>>       session.save();
>>>
>>>       final Node node1 = root.getNode(name);
>>>       System.out.println("node1 = " + node1);
>>>       final Node node2 = root.getNode(Normalizer.normalize(name,
>>> Normalizer.Form.NFC));
>>>       System.out.println("node2 = " + node2);
>>>       final Node node3 = root.getNode(Normalizer.normalize(name,
>>> Normalizer.Form.NFD)); // fails
>>>       System.out.println("node3 = " + node3);
>>>
>>> There's a good chance fetching node3 won't work. It might be  
>>> dependent on
>>> the underlying os and database, but in the case of OSX and Derby,  
>>> this
>>> fails. It's not that surprising, really, given that
>>> Normalizer.normalize(name,
>>> Normalizer.Form.NFC).equals(Normalizer.normalize(name,  
>>> Normalizer.Form.NFD))
>>> is NOT true.
>>>
>>> Now, taking into account the fact that all sorts of clients will  
>>> use a
>>> different Normalizing Form (Firefox seems to encode URL parameters  
>>> with NFD,
>>> Safari with NFC; linux NFC, OSX finder seems to favor NFD),  
>>> wouldn't it be a
>>> safe bet to normalize all input at repository level ? Or do you  
>>> consider
>>> this is something client applications should do ?
>>>
>>> ref: http://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms
>>>
>>> Thanks for any tip, pointer, idea, feedback or reaction !
>>>
>>> Cheers,
>>>
>>> -greg
>>>
>>>
>>
>>
>>



Mime
View raw message