jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Klimetschek <aklim...@day.com>
Subject Re: Escaping/encoding of paths/names/values
Date Fri, 18 Sep 2009 11:26:43 GMT
On Fri, Sep 18, 2009 at 08:16, Charles Brooking
<public+jackrabbit@charlie.brooking.id.au> wrote:
> In tackling the issue of escaping/encoding or paths, names, and values in
> the context of JCR-based web application, I've discovered it's not so
> simple.

Yes ;-) But in practice it shouldn't be a problem, because the rules
are precise. And you already deducted all of them properly! (see
below)

> There are utility methods for escaping/encoding in the
> org.apache.jackrabbit.util.ISO9075 and org.apache.jackrabbit.util.Text
> classes. Although developed under Jackrabbit, they are part of the JCR
> Commons module which only depends on the JCR API.
>
> If you're building a path from user-supplied names, you need to escape
> illegal JCR characters (eg item:1 becomes item%3A1):
>
>  String path = "/foo/" + Text.escapeIllegalJcrChars(name);
>
> Such paths are useful for JCR methods like Session.getItem(...) etc.

Correct.

> (Related to this: is there a utility to escape illegal JCR characters in
> paths as opposed to just names?)

No, but in practice you will mostly just create a single node based on
a user-supplied value or if it's a path, you typically split it up
anyway and create nodes step by step, as there are often other things
to do (eg. mixin types, properties, etc.).

> If you want to use paths in XPath queries, though, you need to escape
> according to ISO9075 rules (eg 1hr0 becomes _x0031_hr0):
>
>  String query =
>    "/jcr:root" + ISO9075.encodePath(node.getPath()) +
>    "/" + ISO9075.encode(name);

Correct.

> For a user-supplied string, this could lead to something like
> ISO9075.encode(Text.escapeIllegalJcrChars(name)).

Yes, although I haven't seen a need for that combination so far, as
you typically run such a query because you know the Node in question
and do a getName() on it or the path you search in are defined by your
application already and are simple and ascii-based (eg. /home/users).

> For values inserted into the queries, you should do escaping to prevent
> incorrect values and query injection. Generally, if you enclose values in
> single quotes, you just need to replace any literal single quote character
> with '' (two consecutive single quote characters). There is also a
> Text.escapeIllegalXpathSearchChars(...) method you should use for calls to
> jcr:contains(...).
>
>  String q =
>    "/jcr:root/foo/element(*, foo)" +
>    "[jcr:contains(@title, '" +
>    Text.escapeIllegalXpathSearchChars(q).replaceAll("'", "''") + "')]"
>    "[@itemID = '" + itemID.replaceAll("'", "''") + "']";

Correct.

> There are further encoding/decoding methods in the Text class for dealing
> with URIs in a webapp. And this is where I get really confused: the JCR
> encoding scheme mimics percent-encoding used in URIs but is only said to
> be "loosely modeled after URI encoding". What is the recommended approach
> in converting between URI paths and their mapping to/from JCR paths?

The allowed chars for JCR names contains the URI set plus a few others
(eg. spaces). Thus the URI set is acutally more constrained.
Therefore, if you have a valid URI, you can map it directly onto a JCR
path without having to worry about escaping (this is by design).

If you go the other way, eg. have a JCR path and want to create an URI
for it, you simply use plain URI escaping for it (which often happens
anyway).

To make everything simpler in the context of URIs, I suggest you
always create only JCR nodes with names that are valid URIs.

> Apologies if I've missed any existing online guides about this. Hopefully
> we can make a nice page for the based on examples like the ones above.

Good idea, you could put it onto the wiki, maybe on the examples page:
http://wiki.apache.org/jackrabbit/ExamplesPage

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Mime
View raw message