lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: WildcardQuery and SpanQuery
Date Wed, 18 Jul 2007 06:52:51 GMT
On Wednesday 18 July 2007 05:58, Cedric Ho wrote:
> Hi everybody,
> We recently need to support wildcard search terms "*", "?" together
> with SpanQuery. It seems that there's no SpanWildcardQuery available.
> After looking into the lucene source code for a while, I guess we can
> either:
> 1. Use SpanRegexQuery, or
> 2. Write our own SpanWildcardQuery, and implements the
> rewrite(IndexReader) method to rewrite the query into a SpanOrQuery
> with some SpanTermQuery.
> Of the two approaches, Option 1 seems to be easier. But I am rather
> concerned about the performance of using regular expression. On the
> other hand, I am not sure if there are any other concerns I am not
> aware of for option 2 (i.e. is there a reason why there's no
> SpanWildcardQuery in the first place?)
> Any advices ?

The basic problem you are facing is that in Lucene
the expansion of the terms is tightly coupled to the generation
of a combination query using the expanded terms.

In contrib/surround the term expansion and query generation
are decoupled using a visitor pattern for the terms. The code is here:

In surround a wild card term can provide either an OR of 
normal term queries, or a SpanOrQuery of span term queries.
This query generation is in class SimpleTerm, which has one method
for a normal boolean OR query over the terms, and one for
a span query for the terms.

In both cases surround uses a regular expression to expand 
the matching terms, but that could be changed to use
another wildcard expansion mechanisms than the ones in
SrndPrefixQuery and SrndTruncQuery, which
are subclasses of SimpleTerm.

With the term expansion and the query combination split,
it is also necessary to limit the maximum number of expanded
terms in another way than Lucene does. In surround the
classes BasicQueryFactory and TooManyBasicQueries are
used for that.

Paul Elschot

> Cedric
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message