lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Tignor <ctig...@thinkmap.com>
Subject custom scoring help
Date Fri, 02 Apr 2010 14:20:05 GMT
Hello,

I'm having a hard time implementing / understanding a very simple custom
scoring situation.

I have created my Similarity class for testing which overrides all the
relevant (I think) methods below, returning 1 for all but coord(int, int)
which returns q / maxOverlap so scores are scaled between 0. and 1..

I call writer.setSimilarity(new HashHitSimilarity()) when indexing
and searcher.setSimilarity(new HashHitSimilarity()) when searching.

The similarity is definitely affecting the scoring but not how I expect.  I
am looking for a straight average of the hits calculated, i.e.
totalHits for a doc / totalHits in search.

The above score with my test search and index of 6 docs should return the
scores below for all 6 documents in my index:

0.8387096774193549
0.3548387096774194
0.3548387096774194
0.25806451612903225
0.1935483870967742
0.12903225806451613

but the scores appear "stretched" and return these instead though I'm unsure
as to where this "stretching" happens:

0.9078212
0.75977653
0.57541895
0.5670391
0.5223464
0.37150836

public class HashHitSimilarity extends Similarity {

    /**
     *
     */
    private static final long serialVersionUID = 811419737205284733L;

    public float tf(float freq) {
        return 1f;
    }

    public float lengthNorm(String fieldName, int numTokens) {
        return 1f;
    }

    public float queryNorm(float sumOfSquaredWeights) {
        return 1f;
    }

    @Override
    public float coord(int overlap, int maxOverlap) {
        return 1f / (float) maxOverlap;
    }

    @Override
    public float idf(int docFreq, int numDocs) {
        return 1f;
    }

    @Override
    public float sloppyFreq(int distance) {
        return 0f;
    }

}




-- 
TH!NKMAP

Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message