lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@lucene.com>
Subject Re: AND / OR mutiple term queris
Date Fri, 16 May 2003 21:41:45 GMT
Xavier Guardiola wrote:
> Yes that's what it should be, but I forgot to mention that I assign
> different weights to different fields as well as different weights to
> different documents. So I may end up with a doc not having all the terms but
> the highest score.
> That's why I don't see a trivial way of getting the results in the desired
> order (first those with all terms and then the rest)...

Try overriding Similarity.coord(int,int).

http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#coord(int,%20int)

You might use something like:

   private static double POWER = 3.0;

   public float coord(int overlap, int maxOverlap) {
     int missing = maxOverlap - overlap; // # of query terms missing
     return (float)Math.pow(1.0 / (missing + 1), POWER);
   }

Thus, a hit missing one query term would have its score multiplied by 
1/8, hits missing two terms would get 1/27th the score, and so on. 
Adjust POWER to suit.  With high-enough POWER you can pretty much 
guarantee that all documents with any missing terms are ranked below any 
with all query terms.

Tell me how it goes,

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message