lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Partial token matches
Date Wed, 26 Apr 2006 18:30:02 GMT

: I'm sure the guys will chime in, but I think you're in significant danger of
: getting a "too many clauses" exception thrown. Try searching on, say, "an".
: Under the covers, Lucene expands your query to have a clause for *every*
: item in your index that starts with "an", so there's a clause for "an" "ana"
: "anb", "anaa", "anab", ....... The shorter your term, the more there'll be,
: and if there are more than 1024, you'll get the exception above. You can set
: the number of clauses to a bigger number, but that may not scale well.

When using any of the queries that expand into a BooleanQuery, there is
almost allways the possibility of hitting TooManyClauses -- but this
approach of using PrefixQuery is definitely safer/faster then a straight
use of WildCardQuery -- at the expense of a Bigger index.

The idea mentioned in this thread is basically the same as an idea Erik
Hatcher has suggested in the past, which i've taken to refering to as
"wildcard term rotating"...
  http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg12261.html

: Consider writing a filter (see Lucene In Action). The filter will return a
: bitset with a bit turned on for each potential match, and avoid this issue.

very true -- but at the expense of scoring information (ie: how many times
does the term appear in the document?) ... it's all a question of
priorities.





-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message