lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: score and frequency
Date Sat, 05 Jun 2004 08:50:05 GMT
On Jun 5, 2004, at 1:13 AM, Niraj Alok wrote:
> I want all the titles which have both "ice" and "hockey" to come above 
> the
> rest (to have higher scores)
> Meaning i would wish the results to appear like:
>
> ice hockey
> ice hockey
> ice hockey
> winter Olympics: hockey, ice, medallists
> ice hockey: British Sekonda Superleague Play-Off Championship: finals
> ice age
> National Hockey League
> Cracking the Ice Age
> ground-ice
>
> My overriden similarity class contains just this method:
> public float coord(int overlap, int maxOverlap) {
>
> return 1.0f;
>
> }
>
>

Use IndexSearcher.explain(Query, docId) to see how the various factors 
in the equation are being set.

You are better off using DefaultSimilarity's implementation of coord() 
than just returning 1.0.  You want, if overlap is greater, to return a 
greater number.  Look at the numbers being passed to coord(), and in 
the cases where both "ice" and "hockey" are present you are probably 
getting 2.  Maybe just return (float) overlap as a first try and see 
the results then.  The explain feature should give you the details you 
need to adjust the equation though - although the default 
implementation does boost the score of documents that have multiple 
terms matching.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message