lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Goller <>
Subject Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/search
Date Wed, 15 Sep 2004 09:54:31 GMT
Doug Cutting wrote:
> wrote:
>>   QueryParser can now handle minimumSimilarity parameter
>>   of FuzzyQuery; FuzzyQuery extended to allow for non-fuzzy
>>   prefixes.
> This looks great!
> It might also be good if one could set the non-fuzzy prefix length used 
> by the QueryParser.  As it stands, fuzzy queries with large indexes that 
> use QueryParser are so slow they're unusable.  But a default prefix of 
> just a couple of characters would make a huge performance improvement.

That's true. We need it since we distinguish between full-forms (inflected words 
as they occur in the documents) and baseforms (after a linguistic analysis) in
the index by using prefixes. Wildcard/Prefix/Fuzzy Queries work on full-forms,
FieldQueries on both ....

I will think about extending QueryParser as you proposed (should not be too 
difficult, we only have to find a reasonable syntax), but I am a little bit
under pressure with other stuff. So I do not know when I will find time.
Everyone else may feel free to go ahead.

> Another idea might be to, rather than (or in addition to) limiting the 
> number of expanded terms by similarity, to limit them by number.  So one 
> could keep, e.g., just the top-scoring 100 terms whose score is greater 
> than 0.5, or somesuch.  This way FuzzyQuery would never trigger 
> BooleanQuery.TooManyClauses.  What do you think?

Also sounds reasonable. Of course it does not solve the efficiency problem
of rewriting a FuzzyQuery. Do you think the expensive part is going through
all terms of a field or is it the Levenstein-computation, or both?

I hope you like my extensions to PraseQuery and PhrasePrefixQuery too :-)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message