lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <>
Subject FuzzyQuery scoring
Date Thu, 23 Dec 2004 11:08:06 GMT
Should we change the scoring behaviour of FuzzyQuery?

The current approach of turning Foo~ into a large
boolean query means that result scores are heavily
diluted for matches. 

In my tests a search for Foo returns documents
containing Foo with a score of 1.
A search for Foo~ returns documents containing Foo
with a score of just 0.01 (this was the top score).

I know Lucene scoring isn't guaranteed to consistently
return values in the range of 0 to 1 but I think we
should make some attempts to avoid scoring
insconsistencies like the one above.

To this end, I have tried changing FuzzyQuery to
internally use this class to ignore the coordination
factor in scores (the number of terms in query):

class FuzzyBooleanQuery extends BooleanQuery
  public Similarity getSimilarity(Searcher searcher)
     return new DefaultSimilarity(){
       public float coord(int overlap, int maxOverlap)
          return 1;

This seems to produce more realistic scores and looks
to preserve the same sort order. 

Any views?


ALL-NEW Yahoo! Messenger - all new features - even more fun!

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message