jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Lukin <lu...@stu.cn.ua>
Subject Re: How to covert string to legal node name?
Date Tue, 27 Nov 2007 11:12:10 GMT
Why inventing the wheel?
What bad about ISO ISO9075 ?

Tuesday 27 November 2007 11:41:08 Marcel Reutegger написав:
> the public review of JSR 283 also contains a third approach how to deal
> with those illegal characters. IMO this should be the preferred one because
> it will ensure interoperability.
>
> <jsr-283-public-review>
> 3.6.3 Exposing non-JCR Names
> An implementation that exposes a non-JCR data store through the JCR API may
> encounter names with characters not allowed within JCR names. To allow for
> this, a JCR repository should expose non-JCR characters as private use
> Unicode code point characters according to the following mapping:
>
> Non-JCR character (Unicode code point)   Private use Unicode code point
> * (U+002A)                                   U+F02A
> / (U+002F)                                   U+F02F
>
> : (U+003A)                                   U+F03A
>
> [ (U+005B)                                   U+F05B
> ] (U+005D)                                   U+F05D
>
> | (U+007C)                                   U+F07C
>
> This mapping should be used when a JCR method returns a name containing a
> non-JCR character. The mapping should also be used (in reverse) when a JCR
> method is called with a path or name containing one of the six private use
> code points above.
> </jsr-283-public-review>
>
> jackrabbit does not yet have a utility, which implements this escaping.
> contributions are welcome! ;)
>
> regards
>   marcel
>
> Jukka Zitting wrote:
> > Hi,
> >
> > On Nov 26, 2007 5:44 PM, Brian Thompson <elephantium@gmail.com> wrote:
> >> In my application, I implemented a custom search/replace method to
> >> filter out illegal characters.  It's pretty simple to write, so I didn't
> >> spend much time looking for a library method to handle it.  AFAIK, the
> >> Jackrabbit API doesn't address this issue.  I could be wrong, though
> >> (correct me if I'm wrong, please, Jackrabbit devs!).
> >
> > There are two classed for this purpose in the jackrabbit-jcr-commons
> > component:
> >
> > org.apache.jackrabbit.util.ISO9075 [1]
> >
> > This class implements the ISO9075 escaping mechanism that the JCR spec
> > uses in the document view serialization format. All invalid name
> > characters are converted to _xNNNN_ sequences, where NNNN is the
> > hexadecimal representation of the Unicode code unit (UTF-16) of the
> > character in question.
> >
> > This escaping format can look a bit surprising if you use the document
> > view export feature, as the _x prefix ends up doubly escaped when
> > exported to XML.
> >
> > org.apache.jackrabbit.util.Text [2]
> >
> > This class implements (among other things) a few variations of the URI
> > escaping mechanism defined in RFC 2396. All invalid (as defined by the
> > escaping method you choose) characters are converted to %NN sequences
> > where NN is the hexadecimal representation of the Unicode code unit
> > (UTF-8) of the character in question.
> >
> > This escaping format can look a bit surprising if you map node names
> > or paths to URIs, as the % prefix ends up doubly escaped.
> >
> > [1]
> > http://jackrabbit.apache.org/api/1.3/org/apache/jackrabbit/util/ISO9075.h
> >tml [2]
> > http://jackrabbit.apache.org/api/1.3/org/apache/jackrabbit/util/Text.html
> >
> > BR,
> >
> > Jukka Zitting



-- 
SY, Alex Lukin
RIPE NIC HDL: LEXA1-RIPE

Mime
View raw message