lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Partial token matches
Date Wed, 26 Apr 2006 17:45:58 GMT
I'm sure the guys will chime in, but I think you're in significant danger of
getting a "too many clauses" exception thrown. Try searching on, say, "an".
Under the covers, Lucene expands your query to have a clause for *every*
item in your index that starts with "an", so there's a clause for "an" "ana"
"anb", "anaa", "anab", ....... The shorter your term, the more there'll be,
and if there are more than 1024, you'll get the exception above. You can set
the number of clauses to a bigger number, but that may not scale well.

Consider writing a filter (see Lucene In Action). The filter will return a
bitset with a bit turned on for each potential match, and avoid this issue.
RegexTermEnum helps a lot here.

Try searching the archive for a thread started by me, titled "I just don't
get wildcards at all" for an exposition by the guys on this sort of thing.
That thread centers on wildcard queries, but I'm pretty sure PrefixQuery
suffers from the same issue.

Chris, Erik, Yonik... Do I have this right????

Erick

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message