lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From markharw00d <markharw...@yahoo.co.uk>
Subject Re: FuzzyLikeThisQuery what does maxNumTerms mean
Date Wed, 09 May 2007 21:09:50 GMT
The shortlisting isn't based on stop words - a score is produced to 
prioritise term selections. The score uses the IDF (inverse document 
frequency) of the original term and mixes in the "edit-distance" for 
each of the fuzzy variations of original terms. Care is taken to ensure 
that in the query produced, fuzzy variants all use the root term's IDF 
(or if the root term is not in the index the average IDF of all of the 
variants is used by each variant). This avoids the rarer variants 
ranking more highly than the source term in query results.

Mark

bhecht wrote:
> Thanks Mark for the detailed explanation.
> So one more question if I may:
> How is the list shortened to to include <maxNumTerms> terms only?
> In your example you had 2 stop words which of course are not included in the
> token stream.
> But what happens if you get more than maxNumTerms terms, how are the
> maxNumTerms selected from the list?
> Thanks.
>   



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message