Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 54911 invoked from network); 6 Dec 2005 22:22:38 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 6 Dec 2005 22:22:38 -0000 Received: (qmail 26187 invoked by uid 500); 6 Dec 2005 22:22:35 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 26151 invoked by uid 500); 6 Dec 2005 22:22:34 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 26134 invoked by uid 99); 6 Dec 2005 22:22:34 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Dec 2005 14:22:34 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [128.230.18.29] (HELO mailer.syr.edu) (128.230.18.29) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Dec 2005 14:22:34 -0800 Received: from [128.230.38.212] (syru38-212.syr.edu) by mailer.syr.edu (LSMTP for Windows NT v1.1b) with SMTP id <0.14C0BC5A@mailer.syr.edu>; Tue, 6 Dec 2005 17:22:12 -0500 Message-ID: <43960F15.8080306@syr.edu> Date: Tue, 06 Dec 2005 17:22:13 -0500 From: Steven Rowe User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050411) X-Accept-Language: en-us, en MIME-Version: 1.0 To: java-dev@lucene.apache.org Subject: Re: "Advanced" query language References: <43947131.2080007@scalix.com> <200512052118.04225.paul.elschot@xs4all.nl> <4395C249.4060302@syr.edu> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Yonik wrote: > For normal text data, with valid unicode characters that aren't legal > XML, I'd rather have a simple escaping mechanism. Something like > backslash escaping that is easily understood. Maybe something as > simple as \00 for � (backslash followed by two hex digits). I agree with your goal of transparency, especially for the cases of human authorship. However, I don't agree with the idea of an application-specific escape syntax. What if someone wants to use the query metacharacter(s) ('\' in your example) literally? The usual answer is to escape the metacharacters, e.g. "\\00" to encode literal "\00". But *especially* for the human-authored cases, introduction of this complexity is less than ideal. An alternative mechanism could be empty XML elements, e.g.: Or less verbosely, with a fixed set of element names (and there are 28 of these, right?: [#x00-#x08] | #x0B | #x0C | [#x0E-#x1F]): -Steve --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org