lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Syntax for query parsers
Date Wed, 09 Jun 2004 18:49:24 GMT
On Wednesday 09 June 2004 15:39, Erik Hatcher wrote:
> On Jun 9, 2004, at 8:53 AM, Terry Steichen wrote:
> > 3) Is there a plan for adding QueryParser support for the SpanQuery
> > family?
> Another important facet to Terry's question here is what syntax to use
> to express all various types of queries?  I suspect that Google stats
> show us that most folks query with 1 - 3 words and do not use the any
> of the advanced features.
> The elegance of the query syntax is quite important, and QueryParser
> has gotten a bit hairy.  I would enjoy discussions on creating new
> query parsers (one size doesn't fit all, I don't think) and what syntax
> should be used.
> Paul Elschot created a "surround" query parser that he posted about to
> the list in April.
> 	Erik

Here is a bit about the syntax for Surround (mostly taken from the
posted tgz file). Basically one has to use an operator for everything,
including AND and OR. I don't expect this to be used for normal
web searches, the target audience is one that wants
to use span queries, boolean operators, and truncations.

Surround consists of operators (uppercase/lowercase):

AND/OR/NOT/nW/nN/() as infix and
AND/OR/nW/nN        as prefix.

Distance operators W and N have default n=1, max 99.
Implemented as SpanQuery with slop = (n - 1).
An example prefix form is:

20n(aa*, bb*, cc*)

The name Surround was chosen because of this prefix form
and because it uses the newly introduced span queries
to implement the proximity operators.
The names of the operators and the prefix and suffix
forms have been borrowed from various other query
languages described on the internet.
AND/OR/NOT are mapped to Lucene's BooleanQuery.

Query terms from the Lucene standard query parser:

^ boost
* internal and suffix truncation
? one character

Some examples:

aa and bb
aa and bb or cc        same effect as:  (aa and bb) or cc
aa NOT bb NOT cc       same effect as:  (aa NOT bb) NOT cc

and(aa,bb,cc)          aa and bb and cc
99w(aa,bb,cc)          ordered span query with slop 98
99n(aa,bb,cc)          unordered span query with slop 98

3w(a?a or bb?, cc*)

title: text: aa
title : text : aa or bb
title:text: aa not bb
title:aa not text:bb        this parses as:   title:(aa not text:bb)

cc 3w dd               infix: dual.

cc N dd N ee           same effect as:   (cc N dd) N ee

text: aa 3d bb       the field applies to the rest of the query.

The OR operator can be used in subqueries for N and W.
Finally, double quotes can be used to search for any
single term. This is different from Lucene, where
double quotes are used for phrases.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message