Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 95866 invoked from network); 9 Jun 2004 18:49:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 9 Jun 2004 18:49:40 -0000 Received: (qmail 91464 invoked by uid 500); 9 Jun 2004 18:49:46 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 91384 invoked by uid 500); 9 Jun 2004 18:49:45 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 91368 invoked by uid 99); 9 Jun 2004 18:49:45 -0000 Received: from [194.109.24.5] (HELO smtp-out4.xs4all.nl) (194.109.24.5) by apache.org (qpsmtpd/0.27.1) with ESMTP; Wed, 09 Jun 2004 11:49:45 -0700 Received: from k7l.local (porta.xs4all.nl [80.127.24.69]) by smtp-out4.xs4all.nl (8.12.10/8.12.10) with ESMTP id i59InO34064277 for ; Wed, 9 Jun 2004 20:49:25 +0200 (CEST) From: Paul Elschot To: lucene-user@jakarta.apache.org Subject: Syntax for query parsers Date: Wed, 9 Jun 2004 20:49:24 +0200 User-Agent: KMail/1.5.4 References: <40C6C9C9.9010005@healthonnet.org> <013e01c44e20$cf777910$6501a8c0@POWERPACK> <78244575-BA1A-11D8-A592-000393A564E6@ehatchersolutions.com> In-Reply-To: <78244575-BA1A-11D8-A592-000393A564E6@ehatchersolutions.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200406092049.24495.paul.elschot@xs4all.nl> X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N On Wednesday 09 June 2004 15:39, Erik Hatcher wrote: > On Jun 9, 2004, at 8:53 AM, Terry Steichen wrote: > > 3) Is there a plan for adding QueryParser support for the SpanQuery > > family? > > Another important facet to Terry's question here is what syntax to use > to express all various types of queries? I suspect that Google stats > show us that most folks query with 1 - 3 words and do not use the any > of the advanced features. > > The elegance of the query syntax is quite important, and QueryParser > has gotten a bit hairy. I would enjoy discussions on creating new > query parsers (one size doesn't fit all, I don't think) and what syntax > should be used. > > Paul Elschot created a "surround" query parser that he posted about to > the list in April. > > Erik Here is a bit about the syntax for Surround (mostly taken from the posted tgz file). Basically one has to use an operator for everything, including AND and OR. I don't expect this to be used for normal web searches, the target audience is one that wants to use span queries, boolean operators, and truncations. Surround consists of operators (uppercase/lowercase): AND/OR/NOT/nW/nN/() as infix and AND/OR/nW/nN as prefix. Distance operators W and N have default n=1, max 99. Implemented as SpanQuery with slop = (n - 1). An example prefix form is: 20n(aa*, bb*, cc*) The name Surround was chosen because of this prefix form and because it uses the newly introduced span queries to implement the proximity operators. The names of the operators and the prefix and suffix forms have been borrowed from various other query languages described on the internet. AND/OR/NOT are mapped to Lucene's BooleanQuery. Query terms from the Lucene standard query parser: field:termtext ^ boost * internal and suffix truncation ? one character Some examples: aa aa and bb aa and bb or cc same effect as: (aa and bb) or cc aa NOT bb NOT cc same effect as: (aa NOT bb) NOT cc and(aa,bb,cc) aa and bb and cc 99w(aa,bb,cc) ordered span query with slop 98 99n(aa,bb,cc) unordered span query with slop 98 20n(aa*,bb*) 3w(a?a or bb?, cc*) title: text: aa title : text : aa or bb title:text: aa not bb title:aa not text:bb this parses as: title:(aa not text:bb) cc 3w dd infix: dual. cc N dd N ee same effect as: (cc N dd) N ee text: aa 3d bb the field applies to the rest of the query. The OR operator can be used in subqueries for N and W. Finally, double quotes can be used to search for any single term. This is different from Lucene, where double quotes are used for phrases. Regards, Paul --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org