jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobias Bocanegra <tri...@day.com>
Subject Re: Unicode, NFC,NFD and node names
Date Wed, 04 Nov 2009 23:26:14 GMT
hi,
i don't think this should be the job of the repository to do
normalization of the paths. likewise a good filesystem (a case
sensitive one :-) does no normalization of it's paths neither.

regards, toby

2009/11/4 Grégory Joseph <gregory.joseph@magnolia-cms.com>:
> fwiw, the following solves the simple problem shown by my previous example:
>
>    private Session wrap(final SessionImpl origSession) throws
> RepositoryException {
>        final WorkspaceImpl workspace = (WorkspaceImpl)
> origSession.getWorkspace();
>        final RepositoryImpl rep = (RepositoryImpl)
> origSession.getRepository();
>        return new SessionImpl(rep, origSession.getSubject(),
> workspace.getConfig()) {
>            public Path getQPath(String path) throws MalformedPathException,
> IllegalNameException, NamespaceException {
>                // this is the only relevant part:
>                return super.getQPath(Normalizer.normalize(path,
> Normalizer.Form.NFC));
>            }
>        };
>    }
>
> If there was a way to swap the session implementation or the
> Name-and/or-PathResolver implementations that are used by default, I might
> give this a spin.
>
> Any opinions about the whole problem?
>
> Cheers,
>
> -g
>
> On Nov 4, 2009, at 6:11 PM, Grégory Joseph wrote:
>
>> Hi list,
>>
>> Given the following code,
>> import java.text.Normalizer;
>> ...
>>
>>       final Session session = ...
>>
>>       final Repository rep = session.getRepository();
>>       System.out.println(rep.getDescriptor("jcr.repository.name") + " " +
>> rep.getDescriptor("jcr.repository.version"));
>>
>>       final Node root = session.getRootNode();
>>       final String name = "föö";
>>       System.out.println("Normalizer.isNormalized(name,
>> Normalizer.Form.NFC) = " + Normalizer.isNormalized(name,
>> Normalizer.Form.NFC)); // true
>>       System.out.println("Normalizer.isNormalized(name,
>> Normalizer.Form.NFD) = " + Normalizer.isNormalized(name,
>> Normalizer.Form.NFD)); // false
>>       root.addNode(name);
>>       session.save();
>>
>>       final Node node1 = root.getNode(name);
>>       System.out.println("node1 = " + node1);
>>       final Node node2 = root.getNode(Normalizer.normalize(name,
>> Normalizer.Form.NFC));
>>       System.out.println("node2 = " + node2);
>>       final Node node3 = root.getNode(Normalizer.normalize(name,
>> Normalizer.Form.NFD)); // fails
>>       System.out.println("node3 = " + node3);
>>
>> There's a good chance fetching node3 won't work. It might be dependent on
>> the underlying os and database, but in the case of OSX and Derby, this
>> fails. It's not that surprising, really, given that
>> Normalizer.normalize(name,
>> Normalizer.Form.NFC).equals(Normalizer.normalize(name, Normalizer.Form.NFD))
>> is NOT true.
>>
>> Now, taking into account the fact that all sorts of clients will use a
>> different Normalizing Form (Firefox seems to encode URL parameters with NFD,
>> Safari with NFC; linux NFC, OSX finder seems to favor NFD), wouldn't it be a
>> safe bet to normalize all input at repository level ? Or do you consider
>> this is something client applications should do ?
>>
>> ref: http://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms
>>
>> Thanks for any tip, pointer, idea, feedback or reaction !
>>
>> Cheers,
>>
>> -greg
>>
>>
>
>
>

Mime
View raw message