Return-Path: Delivered-To: apmail-jackrabbit-users-archive@minotaur.apache.org Received: (qmail 43096 invoked from network); 18 Sep 2009 11:27:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 18 Sep 2009 11:27:18 -0000 Received: (qmail 32531 invoked by uid 500); 18 Sep 2009 11:27:17 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 32470 invoked by uid 500); 18 Sep 2009 11:27:17 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 32459 invoked by uid 99); 18 Sep 2009 11:27:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Sep 2009 11:27:17 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of aklimets@day.com designates 207.126.148.183 as permitted sender) Received: from [207.126.148.183] (HELO eu3sys201aog003.obsmtp.com) (207.126.148.183) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 18 Sep 2009 11:27:08 +0000 Received: from source ([209.85.218.228]) by eu3sys201aob003.postini.com ([207.126.154.11]) with SMTP ID DSNKSrNudlDhBYF0svx4g2Nn1IGvMyJCA7Ul@postini.com; Fri, 18 Sep 2009 11:26:47 UTC Received: by bwz28 with SMTP id 28so1026343bwz.30 for ; Fri, 18 Sep 2009 04:26:43 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.81.84 with SMTP id w20mr366038fak.17.1253273203599; Fri, 18 Sep 2009 04:26:43 -0700 (PDT) In-Reply-To: <2e81aa371296e5baa16c9720a90e82e3.squirrel@webmail.charlie.brooking.id.au> References: <2e81aa371296e5baa16c9720a90e82e3.squirrel@webmail.charlie.brooking.id.au> Date: Fri, 18 Sep 2009 13:26:43 +0200 Message-ID: Subject: Re: Escaping/encoding of paths/names/values From: Alexander Klimetschek To: users@jackrabbit.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Fri, Sep 18, 2009 at 08:16, Charles Brooking wrote: > In tackling the issue of escaping/encoding or paths, names, and values in > the context of JCR-based web application, I've discovered it's not so > simple. Yes ;-) But in practice it shouldn't be a problem, because the rules are precise. And you already deducted all of them properly! (see below) > There are utility methods for escaping/encoding in the > org.apache.jackrabbit.util.ISO9075 and org.apache.jackrabbit.util.Text > classes. Although developed under Jackrabbit, they are part of the JCR > Commons module which only depends on the JCR API. > > If you're building a path from user-supplied names, you need to escape > illegal JCR characters (eg item:1 becomes item%3A1): > > =A0String path =3D "/foo/" + Text.escapeIllegalJcrChars(name); > > Such paths are useful for JCR methods like Session.getItem(...) etc. Correct. > (Related to this: is there a utility to escape illegal JCR characters in > paths as opposed to just names?) No, but in practice you will mostly just create a single node based on a user-supplied value or if it's a path, you typically split it up anyway and create nodes step by step, as there are often other things to do (eg. mixin types, properties, etc.). > If you want to use paths in XPath queries, though, you need to escape > according to ISO9075 rules (eg 1hr0 becomes _x0031_hr0): > > =A0String query =3D > =A0 =A0"/jcr:root" + ISO9075.encodePath(node.getPath()) + > =A0 =A0"/" + ISO9075.encode(name); Correct. > For a user-supplied string, this could lead to something like > ISO9075.encode(Text.escapeIllegalJcrChars(name)). Yes, although I haven't seen a need for that combination so far, as you typically run such a query because you know the Node in question and do a getName() on it or the path you search in are defined by your application already and are simple and ascii-based (eg. /home/users). > For values inserted into the queries, you should do escaping to prevent > incorrect values and query injection. Generally, if you enclose values in > single quotes, you just need to replace any literal single quote characte= r > with '' (two consecutive single quote characters). There is also a > Text.escapeIllegalXpathSearchChars(...) method you should use for calls t= o > jcr:contains(...). > > =A0String q =3D > =A0 =A0"/jcr:root/foo/element(*, foo)" + > =A0 =A0"[jcr:contains(@title, '" + > =A0 =A0Text.escapeIllegalXpathSearchChars(q).replaceAll("'", "''") + "')]= " > =A0 =A0"[@itemID =3D '" + itemID.replaceAll("'", "''") + "']"; Correct. > There are further encoding/decoding methods in the Text class for dealing > with URIs in a webapp. And this is where I get really confused: the JCR > encoding scheme mimics percent-encoding used in URIs but is only said to > be "loosely modeled after URI encoding". What is the recommended approach > in converting between URI paths and their mapping to/from JCR paths? The allowed chars for JCR names contains the URI set plus a few others (eg. spaces). Thus the URI set is acutally more constrained. Therefore, if you have a valid URI, you can map it directly onto a JCR path without having to worry about escaping (this is by design). If you go the other way, eg. have a JCR path and want to create an URI for it, you simply use plain URI escaping for it (which often happens anyway). To make everything simpler in the context of URIs, I suggest you always create only JCR nodes with names that are valid URIs. > Apologies if I've missed any existing online guides about this. Hopefully > we can make a nice page for the based on examples like the ones above. Good idea, you could put it onto the wiki, maybe on the examples page: http://wiki.apache.org/jackrabbit/ExamplesPage Regards, Alex --=20 Alexander Klimetschek alexander.klimetschek@day.com