jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Klimetschek <aklim...@day.com>
Subject Re: Escaping/encoding of paths/names/values
Date Fri, 18 Sep 2009 11:26:43 GMT
On Fri, Sep 18, 2009 at 08:16, Charles Brooking
<public+jackrabbit@charlie.brooking.id.au> wrote:
> In tackling the issue of escaping/encoding or paths, names, and values in
> the context of JCR-based web application, I've discovered it's not so
> simple.

Yes ;-) But in practice it shouldn't be a problem, because the rules
are precise. And you already deducted all of them properly! (see

> There are utility methods for escaping/encoding in the
> org.apache.jackrabbit.util.ISO9075 and org.apache.jackrabbit.util.Text
> classes. Although developed under Jackrabbit, they are part of the JCR
> Commons module which only depends on the JCR API.
> If you're building a path from user-supplied names, you need to escape
> illegal JCR characters (eg item:1 becomes item%3A1):
>  String path = "/foo/" + Text.escapeIllegalJcrChars(name);
> Such paths are useful for JCR methods like Session.getItem(...) etc.


> (Related to this: is there a utility to escape illegal JCR characters in
> paths as opposed to just names?)

No, but in practice you will mostly just create a single node based on
a user-supplied value or if it's a path, you typically split it up
anyway and create nodes step by step, as there are often other things
to do (eg. mixin types, properties, etc.).

> If you want to use paths in XPath queries, though, you need to escape
> according to ISO9075 rules (eg 1hr0 becomes _x0031_hr0):
>  String query =
>    "/jcr:root" + ISO9075.encodePath(node.getPath()) +
>    "/" + ISO9075.encode(name);


> For a user-supplied string, this could lead to something like
> ISO9075.encode(Text.escapeIllegalJcrChars(name)).

Yes, although I haven't seen a need for that combination so far, as
you typically run such a query because you know the Node in question
and do a getName() on it or the path you search in are defined by your
application already and are simple and ascii-based (eg. /home/users).

> For values inserted into the queries, you should do escaping to prevent
> incorrect values and query injection. Generally, if you enclose values in
> single quotes, you just need to replace any literal single quote character
> with '' (two consecutive single quote characters). There is also a
> Text.escapeIllegalXpathSearchChars(...) method you should use for calls to
> jcr:contains(...).
>  String q =
>    "/jcr:root/foo/element(*, foo)" +
>    "[jcr:contains(@title, '" +
>    Text.escapeIllegalXpathSearchChars(q).replaceAll("'", "''") + "')]"
>    "[@itemID = '" + itemID.replaceAll("'", "''") + "']";


> There are further encoding/decoding methods in the Text class for dealing
> with URIs in a webapp. And this is where I get really confused: the JCR
> encoding scheme mimics percent-encoding used in URIs but is only said to
> be "loosely modeled after URI encoding". What is the recommended approach
> in converting between URI paths and their mapping to/from JCR paths?

The allowed chars for JCR names contains the URI set plus a few others
(eg. spaces). Thus the URI set is acutally more constrained.
Therefore, if you have a valid URI, you can map it directly onto a JCR
path without having to worry about escaping (this is by design).

If you go the other way, eg. have a JCR path and want to create an URI
for it, you simply use plain URI escaping for it (which often happens

To make everything simpler in the context of URIs, I suggest you
always create only JCR nodes with names that are valid URIs.

> Apologies if I've missed any existing online guides about this. Hopefully
> we can make a nice page for the based on examples like the ones above.

Good idea, you could put it onto the wiki, maybe on the examples page:


Alexander Klimetschek

View raw message