lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Em <mailformailingli...@yahoo.de>
Subject Re: Custom lucene scoring - Dot product between field boost and query boost
Date Tue, 21 Feb 2012 16:07:26 GMT
Hi Yuval,

> 1. Performances: I am calculating all the TF/IDF stuff and NORMS for
> nothing...
You aren't calculating that much, since you declared all those values as
constants. What are you worried about?

> 2. The score I get from the TopScoreDocCollector is not the same as I
get from the Explanation.
> Here is part of my code:
Could you provide us the code where you are setting the Similarity, please?

Kind regards,
Em

Am 21.02.2012 16:18, schrieb Yuval Kesten:
> Hi,
> I want to use Lucene with the following scoring logic:
> When I index my documents I want to set for each field a score/weight.
> When I query my index I want to set for each query term a score/weight.
> 
> I will NEVER index or query with many instances of the same field - In each query (document)
there will be 0-1 instances with the same field name.
> My fields/query term are not analyzed - they are already made out of one token.
> 
> I want the score to be simply the dot product between the fields of the query to the
fields of the document if they have the same value.
> 
> For example:
> Query:
> Field Name
> 
> Field Value
> 
> Field Score
> 
> 1
> 
> AA
> 
> 0.1
> 
> 7
> 
> BB
> 
> 0.2
> 
> 8
> 
> CC
> 
> 0.3
> 
> 
> Document 1:
> Field Name
> 
> Field Value
> 
> Field Score
> 
> 1
> 
> AA
> 
> 0.2
> 
> 2
> 
> DD
> 
> 0.8
> 
> 7
> 
> CC
> 
> 0.999
> 
> 10
> 
> FFF
> 
> 0.1
> 
> 
> Document 2:
> Field Name
> 
> Field Value
> 
> Field Score
> 
> 7
> 
> BB
> 
> 0.3
> 
> 8
> 
> CC
> 
> 0.5
> 
> 
> The scores should be:
> Score(q,d1) = FIELD_1_SCORE_Q * FILED_1_SCORE_D1 = 0.1 * 0.2  = 0.02
> Score(q,d2) = FIELD_7_SCORE_Q * FILED_7_SCORE_D2 + FIELD_8_SCORE_Q * FILED_8_SCORE_D2
= (0.2 * 0.3) + (0.3 * 0.5)
> 
> What would be the best way implement it? In terms of accuracy and performances (I don't
need TF and IDF calculations).
> 
> I currently implemented it by setting boosts to the fields and query terms.
> Then I overwritten the DefaultSimilarity class:
> 
> public class MySimilarity extends DefaultSimilarity {
> 
>     @Override
>     public float computeNorm(String field, FieldInvertState state) {
>         return state.getBoost();
>     }
> 
>     @Override
>     public float queryNorm(float sumOfSquaredWeights) {
>         return 1;
>     }
> 
>     @Override
>     public float tf(float freq) {
>         return 1;
>     }
> 
>     @Override
>     public float idf(int docFreq, int numDocs) {
>         return 1;
>     }
> 
>     @Override
>     public float coord(int overlap, int maxOverlap) {
>         return 1;
>     }
> 
> }
> 
> And based on http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/scoring.html
this should work.
> Problems:
> 1. Performances: I am calculating all the TF/IDF stuff and NORMS for nothing...
> 2. The score I get from the TopScoreDocCollector is not the same as I get from the Explanation.
> Here is part of my code:
> 
> indexSearcher = new IndexSearcher(IndexReader.open(directory, true));
> TopScoreDocCollector collector = TopScoreDocCollector.create(iTopN, true);
> indexSearcher.search(query, collector);
> ScoreDoc[] hits = collector.topDocs().scoreDocs;
> for (int i = 0; i < hits.length; ++i) {
> int docId = hits[i].doc;
> Document d = indexSearcher.doc(docId);
> double score = hits[i].score;
> String id = d.get(FIELD_ID);
> Explanation explanation = indexSearcher.explain(query, docId);
> }
> 
> Thanks!
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message