jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: How to covert string to legal node name?
Date Tue, 27 Nov 2007 09:41:08 GMT
the public review of JSR 283 also contains a third approach how to deal with 
those illegal characters. IMO this should be the preferred one because it will 
ensure interoperability.

<jsr-283-public-review>
3.6.3 Exposing non-JCR Names
An implementation that exposes a non-JCR data store through the JCR API may 
encounter names with characters not allowed within JCR names. To allow for this, 
a JCR repository should expose non-JCR characters as private use Unicode code 
point characters according to the following mapping:

Non-JCR character (Unicode code point)   Private use Unicode code point
* (U+002A)                                   U+F02A
/ (U+002F)                                   U+F02F
: (U+003A)                                   U+F03A
[ (U+005B)                                   U+F05B
] (U+005D)                                   U+F05D
| (U+007C)                                   U+F07C

This mapping should be used when a JCR method returns a name containing a 
non-JCR character. The mapping should also be used (in reverse) when a JCR 
method is called with a path or name containing one of the six private use code 
points above.
</jsr-283-public-review>

jackrabbit does not yet have a utility, which implements this escaping. 
contributions are welcome! ;)

regards
  marcel

Jukka Zitting wrote:
> Hi,
> 
> On Nov 26, 2007 5:44 PM, Brian Thompson <elephantium@gmail.com> wrote:
>> In my application, I implemented a custom search/replace method to filter
>> out illegal characters.  It's pretty simple to write, so I didn't spend much
>> time looking for a library method to handle it.  AFAIK, the Jackrabbit API
>> doesn't address this issue.  I could be wrong, though (correct me if I'm
>> wrong, please, Jackrabbit devs!).
> 
> There are two classed for this purpose in the jackrabbit-jcr-commons component:
> 
> org.apache.jackrabbit.util.ISO9075 [1]
> 
> This class implements the ISO9075 escaping mechanism that the JCR spec
> uses in the document view serialization format. All invalid name
> characters are converted to _xNNNN_ sequences, where NNNN is the
> hexadecimal representation of the Unicode code unit (UTF-16) of the
> character in question.
> 
> This escaping format can look a bit surprising if you use the document
> view export feature, as the _x prefix ends up doubly escaped when
> exported to XML.
> 
> org.apache.jackrabbit.util.Text [2]
> 
> This class implements (among other things) a few variations of the URI
> escaping mechanism defined in RFC 2396. All invalid (as defined by the
> escaping method you choose) characters are converted to %NN sequences
> where NN is the hexadecimal representation of the Unicode code unit
> (UTF-8) of the character in question.
> 
> This escaping format can look a bit surprising if you map node names
> or paths to URIs, as the % prefix ends up doubly escaped.
> 
> [1] http://jackrabbit.apache.org/api/1.3/org/apache/jackrabbit/util/ISO9075.html
> [2] http://jackrabbit.apache.org/api/1.3/org/apache/jackrabbit/util/Text.html
> 
> BR,
> 
> Jukka Zitting
> 
> 


Mime
View raw message