jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobias Bocanegra <tri...@day.com>
Subject Re: Unicode, NFC,NFD and node names
Date Thu, 05 Nov 2009 14:39:31 GMT
2009/11/5 Grégory Joseph <gregory.joseph@magnolia-cms.com>:
> Hi Toby,
>
> On Nov 5, 2009, at 12:26 AM, Tobias Bocanegra wrote:
>
>> hi,
>> i don't think this should be the job of the repository to do
>> normalization of the paths. likewise a good filesystem (a case
>> sensitive one :-) does no normalization of it's paths neither.
>
> Since I wrote this yesterday in quite a rush, let me just stress the fact
> that I'm only talking about unicode normalization forms; a filesystem won't
> have to bother about that, since it doesn't have a whole slew of clients who
> decide to use one form or the other for no apparent reason. For "fun", you
> might want to see this:
> http://www.mail-archive.com/bug-bash@gnu.org/msg05818.html
>
> I can see why one would want to make a differentiation between the 2 forms
> in *values*; in item names, not so much.
well, i see a repository somewhere in between filesystems and databases.

however, i think the path to an item needs to be solid - the search
can still provide you with all stemming and normalization you need.
regards, toby

>
> Thoughts ?
>
> -g
>
>> 2009/11/4 Grégory Joseph <gregory.joseph@magnolia-cms.com>:
>>>
>>> fwiw, the following solves the simple problem shown by my previous
>>> example:
>>>
>>>   private Session wrap(final SessionImpl origSession) throws
>>> RepositoryException {
>>>       final WorkspaceImpl workspace = (WorkspaceImpl)
>>> origSession.getWorkspace();
>>>       final RepositoryImpl rep = (RepositoryImpl)
>>> origSession.getRepository();
>>>       return new SessionImpl(rep, origSession.getSubject(),
>>> workspace.getConfig()) {
>>>           public Path getQPath(String path) throws
>>> MalformedPathException,
>>> IllegalNameException, NamespaceException {
>>>               // this is the only relevant part:
>>>               return super.getQPath(Normalizer.normalize(path,
>>> Normalizer.Form.NFC));
>>>           }
>>>       };
>>>   }
>>>
>>> If there was a way to swap the session implementation or the
>>> Name-and/or-PathResolver implementations that are used by default, I
>>> might
>>> give this a spin.
>>>
>>> Any opinions about the whole problem?
>>>
>>> Cheers,
>>>
>>> -g
>>>
>>> On Nov 4, 2009, at 6:11 PM, Grégory Joseph wrote:
>>>
>>>> Hi list,
>>>>
>>>> Given the following code,
>>>> import java.text.Normalizer;
>>>> ...
>>>>
>>>>      final Session session = ...
>>>>
>>>>      final Repository rep = session.getRepository();
>>>>      System.out.println(rep.getDescriptor("jcr.repository.name") + " "
+
>>>> rep.getDescriptor("jcr.repository.version"));
>>>>
>>>>      final Node root = session.getRootNode();
>>>>      final String name = "föö";
>>>>      System.out.println("Normalizer.isNormalized(name,
>>>> Normalizer.Form.NFC) = " + Normalizer.isNormalized(name,
>>>> Normalizer.Form.NFC)); // true
>>>>      System.out.println("Normalizer.isNormalized(name,
>>>> Normalizer.Form.NFD) = " + Normalizer.isNormalized(name,
>>>> Normalizer.Form.NFD)); // false
>>>>      root.addNode(name);
>>>>      session.save();
>>>>
>>>>      final Node node1 = root.getNode(name);
>>>>      System.out.println("node1 = " + node1);
>>>>      final Node node2 = root.getNode(Normalizer.normalize(name,
>>>> Normalizer.Form.NFC));
>>>>      System.out.println("node2 = " + node2);
>>>>      final Node node3 = root.getNode(Normalizer.normalize(name,
>>>> Normalizer.Form.NFD)); // fails
>>>>      System.out.println("node3 = " + node3);
>>>>
>>>> There's a good chance fetching node3 won't work. It might be dependent
>>>> on
>>>> the underlying os and database, but in the case of OSX and Derby, this
>>>> fails. It's not that surprising, really, given that
>>>> Normalizer.normalize(name,
>>>> Normalizer.Form.NFC).equals(Normalizer.normalize(name,
>>>> Normalizer.Form.NFD))
>>>> is NOT true.
>>>>
>>>> Now, taking into account the fact that all sorts of clients will use a
>>>> different Normalizing Form (Firefox seems to encode URL parameters with
>>>> NFD,
>>>> Safari with NFC; linux NFC, OSX finder seems to favor NFD), wouldn't it
>>>> be a
>>>> safe bet to normalize all input at repository level ? Or do you consider
>>>> this is something client applications should do ?
>>>>
>>>> ref: http://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms
>>>>
>>>> Thanks for any tip, pointer, idea, feedback or reaction !
>>>>
>>>> Cheers,
>>>>
>>>> -greg
>>>>
>>>>
>>>
>>>
>>>
>
>
>

Mime
View raw message