lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <markharw...@yahoo.co.uk>
Subject FuzzyQuery scoring
Date Thu, 23 Dec 2004 11:08:06 GMT
Should we change the scoring behaviour of FuzzyQuery?

The current approach of turning Foo~ into a large
boolean query means that result scores are heavily
diluted for matches. 

In my tests a search for Foo returns documents
containing Foo with a score of 1.
A search for Foo~ returns documents containing Foo
with a score of just 0.01 (this was the top score).

I know Lucene scoring isn't guaranteed to consistently
return values in the range of 0 to 1 but I think we
should make some attempts to avoid scoring
insconsistencies like the one above.

To this end, I have tried changing FuzzyQuery to
internally use this class to ignore the coordination
factor in scores (the number of terms in query):

class FuzzyBooleanQuery extends BooleanQuery
{    
  public Similarity getSimilarity(Searcher searcher)
  {        
     return new DefaultSimilarity(){
       public float coord(int overlap, int maxOverlap)
       {
          return 1;
       }
     };
  }
}

This seems to produce more realistic scores and looks
to preserve the same sort order. 

Any views?

Cheers
Mark


	
	
		
___________________________________________________________ 
ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message