Hi Toby,
On Nov 5, 2009, at 12:26 AM, Tobias Bocanegra wrote:
> hi,
> i don't think this should be the job of the repository to do
> normalization of the paths. likewise a good filesystem (a case
> sensitive one :-) does no normalization of it's paths neither.
Since I wrote this yesterday in quite a rush, let me just stress the
fact that I'm only talking about unicode normalization forms; a
filesystem won't have to bother about that, since it doesn't have a
whole slew of clients who decide to use one form or the other for no
apparent reason. For "fun", you might want to see this: http://www.mail-archive.com/bug-bash@gnu.org/msg05818.html
I can see why one would want to make a differentiation between the 2
forms in *values*; in item names, not so much.
Thoughts ?
-g
> 2009/11/4 Grégory Joseph <gregory.joseph@magnolia-cms.com>:
>> fwiw, the following solves the simple problem shown by my previous
>> example:
>>
>> private Session wrap(final SessionImpl origSession) throws
>> RepositoryException {
>> final WorkspaceImpl workspace = (WorkspaceImpl)
>> origSession.getWorkspace();
>> final RepositoryImpl rep = (RepositoryImpl)
>> origSession.getRepository();
>> return new SessionImpl(rep, origSession.getSubject(),
>> workspace.getConfig()) {
>> public Path getQPath(String path) throws
>> MalformedPathException,
>> IllegalNameException, NamespaceException {
>> // this is the only relevant part:
>> return super.getQPath(Normalizer.normalize(path,
>> Normalizer.Form.NFC));
>> }
>> };
>> }
>>
>> If there was a way to swap the session implementation or the
>> Name-and/or-PathResolver implementations that are used by default,
>> I might
>> give this a spin.
>>
>> Any opinions about the whole problem?
>>
>> Cheers,
>>
>> -g
>>
>> On Nov 4, 2009, at 6:11 PM, Grégory Joseph wrote:
>>
>>> Hi list,
>>>
>>> Given the following code,
>>> import java.text.Normalizer;
>>> ...
>>>
>>> final Session session = ...
>>>
>>> final Repository rep = session.getRepository();
>>> System.out.println(rep.getDescriptor("jcr.repository.name")
>>> + " " +
>>> rep.getDescriptor("jcr.repository.version"));
>>>
>>> final Node root = session.getRootNode();
>>> final String name = "föö";
>>> System.out.println("Normalizer.isNormalized(name,
>>> Normalizer.Form.NFC) = " + Normalizer.isNormalized(name,
>>> Normalizer.Form.NFC)); // true
>>> System.out.println("Normalizer.isNormalized(name,
>>> Normalizer.Form.NFD) = " + Normalizer.isNormalized(name,
>>> Normalizer.Form.NFD)); // false
>>> root.addNode(name);
>>> session.save();
>>>
>>> final Node node1 = root.getNode(name);
>>> System.out.println("node1 = " + node1);
>>> final Node node2 = root.getNode(Normalizer.normalize(name,
>>> Normalizer.Form.NFC));
>>> System.out.println("node2 = " + node2);
>>> final Node node3 = root.getNode(Normalizer.normalize(name,
>>> Normalizer.Form.NFD)); // fails
>>> System.out.println("node3 = " + node3);
>>>
>>> There's a good chance fetching node3 won't work. It might be
>>> dependent on
>>> the underlying os and database, but in the case of OSX and Derby,
>>> this
>>> fails. It's not that surprising, really, given that
>>> Normalizer.normalize(name,
>>> Normalizer.Form.NFC).equals(Normalizer.normalize(name,
>>> Normalizer.Form.NFD))
>>> is NOT true.
>>>
>>> Now, taking into account the fact that all sorts of clients will
>>> use a
>>> different Normalizing Form (Firefox seems to encode URL parameters
>>> with NFD,
>>> Safari with NFC; linux NFC, OSX finder seems to favor NFD),
>>> wouldn't it be a
>>> safe bet to normalize all input at repository level ? Or do you
>>> consider
>>> this is something client applications should do ?
>>>
>>> ref: http://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms
>>>
>>> Thanks for any tip, pointer, idea, feedback or reaction !
>>>
>>> Cheers,
>>>
>>> -greg
>>>
>>>
>>
>>
>>
|