lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Goller <gol...@detego-software.de>
Subject Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/search FuzzyQuery.java FuzzyTermEnum.java
Date Wed, 15 Sep 2004 09:54:31 GMT
Doug Cutting wrote:
> goller@apache.org wrote:
> 
>>   QueryParser can now handle minimumSimilarity parameter
>>   of FuzzyQuery; FuzzyQuery extended to allow for non-fuzzy
>>   prefixes.
> 
> 
> This looks great!
> 
> It might also be good if one could set the non-fuzzy prefix length used 
> by the QueryParser.  As it stands, fuzzy queries with large indexes that 
> use QueryParser are so slow they're unusable.  But a default prefix of 
> just a couple of characters would make a huge performance improvement.

That's true. We need it since we distinguish between full-forms (inflected words 
as they occur in the documents) and baseforms (after a linguistic analysis) in
the index by using prefixes. Wildcard/Prefix/Fuzzy Queries work on full-forms,
FieldQueries on both ....

I will think about extending QueryParser as you proposed (should not be too 
difficult, we only have to find a reasonable syntax), but I am a little bit
under pressure with other stuff. So I do not know when I will find time.
Everyone else may feel free to go ahead.

> Another idea might be to, rather than (or in addition to) limiting the 
> number of expanded terms by similarity, to limit them by number.  So one 
> could keep, e.g., just the top-scoring 100 terms whose score is greater 
> than 0.5, or somesuch.  This way FuzzyQuery would never trigger 
> BooleanQuery.TooManyClauses.  What do you think?

Also sounds reasonable. Of course it does not solve the efficiency problem
of rewriting a FuzzyQuery. Do you think the expensive part is going through
all terms of a field or is it the Levenstein-computation, or both?

I hope you like my extensions to PraseQuery and PhrasePrefixQuery too :-)

regards,
Christoph


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message