lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: using MultiFieldQueryParser and WildcardQuery?
Date Fri, 19 May 2006 01:26:40 GMT
I'm pretty sure that just submitting the query will work. You might want to
use the QueryParser(String, Analyzer) form. Don't be put off by the fact
that the String is the "default field", it doesn't make any difference given
that you qualify each term with the field. In fact, you can even use a
PerFieldAnalyzerWrapper for the Analyzer and have different analyzers for
each field "for free".

That said, I really doubt you want to do this if you have much data. I fear
you'll get a TooManyClauses exception. Lucene In Action describes the why of
this, and "the guys" gave me a great service in the thread titled "I just
don't get wildcards at all". The short form is that if you have more than
1024 terms that could be part of this query, you'll get this exception. You
can bump the maximum number of clauses allowed, but that's a solution I'm
leery as I'm sure it'll break sometime, probably just after I release it to
my customers <G>.

The solution I used was to subclass QueryParser and override the wildcard
(and prefix and etc.) methods and return a ConstanScoreQuery with a Filter.
The filter is built up by using a WildcardTermEnum. This may or may not work
in your case, it depends on the characteristics of your data and query and
response time expectations.

Another approach is to use clever indexing. That is, when you index, overlay
(see the synonyms discussion in LIA) index whiter, white, whit, whi, wh, w
in the same position. Then you never have to search for a wildcard, you can
just search for white and hit white and whiter.

Anyway, I really recommend both LIA and the thread above, as well as
searching the archives for wildcards.

Best
Erick

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message