Return-Path: Delivered-To: apmail-jackrabbit-users-archive@minotaur.apache.org Received: (qmail 41286 invoked from network); 18 Sep 2009 06:16:38 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 18 Sep 2009 06:16:38 -0000 Received: (qmail 7742 invoked by uid 500); 18 Sep 2009 06:16:38 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 7718 invoked by uid 500); 18 Sep 2009 06:16:38 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 7707 invoked by uid 99); 18 Sep 2009 06:16:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Sep 2009 06:16:37 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [208.97.187.133] (HELO webmail2.g.dreamhost.com) (208.97.187.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Sep 2009 06:16:29 +0000 Received: from webmail.charlie.brooking.id.au (localhost [127.0.0.1]) by webmail2.g.dreamhost.com (Postfix) with ESMTP id 36626DC8D1 for ; Thu, 17 Sep 2009 23:16:08 -0700 (PDT) Received: from 130.102.79.49 (proxying for 130.102.79.49) (SquirrelMail authenticated user public@charlie.brooking.id.au) by webmail.charlie.brooking.id.au with HTTP; Fri, 18 Sep 2009 16:16:08 +1000 Message-ID: <2e81aa371296e5baa16c9720a90e82e3.squirrel@webmail.charlie.brooking.id.au> Date: Fri, 18 Sep 2009 16:16:08 +1000 Subject: Escaping/encoding of paths/names/values From: "Charles Brooking" To: users@jackrabbit.apache.org User-Agent: SquirrelMail/1.4.19 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org Hi all, In tackling the issue of escaping/encoding or paths, names, and values in the context of JCR-based web application, I've discovered it's not so simple. From my searching at least, there is little information online to help, so I thought I'd write with my understanding so far and perhaps others can chip in (most likely to correct me). There are utility methods for escaping/encoding in the org.apache.jackrabbit.util.ISO9075 and org.apache.jackrabbit.util.Text classes. Although developed under Jackrabbit, they are part of the JCR Commons module which only depends on the JCR API. If you're building a path from user-supplied names, you need to escape illegal JCR characters (eg item:1 becomes item%3A1): String path = "/foo/" + Text.escapeIllegalJcrChars(name); Such paths are useful for JCR methods like Session.getItem(...) etc. (Related to this: is there a utility to escape illegal JCR characters in paths as opposed to just names?) If you want to use paths in XPath queries, though, you need to escape according to ISO9075 rules (eg 1hr0 becomes _x0031_hr0): String query = "/jcr:root" + ISO9075.encodePath(node.getPath()) + "/" + ISO9075.encode(name); For a user-supplied string, this could lead to something like ISO9075.encode(Text.escapeIllegalJcrChars(name)). For values inserted into the queries, you should do escaping to prevent incorrect values and query injection. Generally, if you enclose values in single quotes, you just need to replace any literal single quote character with '' (two consecutive single quote characters). There is also a Text.escapeIllegalXpathSearchChars(...) method you should use for calls to jcr:contains(...). String q = "/jcr:root/foo/element(*, foo)" + "[jcr:contains(@title, '" + Text.escapeIllegalXpathSearchChars(q).replaceAll("'", "''") + "')]" "[@itemID = '" + itemID.replaceAll("'", "''") + "']"; There are further encoding/decoding methods in the Text class for dealing with URIs in a webapp. And this is where I get really confused: the JCR encoding scheme mimics percent-encoding used in URIs but is only said to be "loosely modeled after URI encoding". What is the recommended approach in converting between URI paths and their mapping to/from JCR paths? Apologies if I've missed any existing online guides about this. Hopefully we can make a nice page for the based on examples like the ones above. Later Charlie