lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Em <mailformailingli...@yahoo.de>
Subject Re: Custom lucene scoring - Dot product between field boost and query boost
Date Wed, 22 Feb 2012 09:16:59 GMT
Hi Yuval,

> 1. Regarding the performances - the similarity class (And my subtype
as well) gets the IDF and TF and SQUARED SUMS calculations as inputs -
they just factor them differently. Even though I ignore the values they
are being computed.

Good point. However I think that these values are relatively cheap and
nothing to worry about, as long as it does not harm your performance
(measureable!).

> 2. I have written this code:
>     static {
>         Similarity.setDefault(new MySimilarity());
>     }

What class do you get back when you call getSimilarity on your searcher?

Could you please provide us the output of your scores and your
Explanation's?

Regards,
Em

Am 22.02.2012 08:17, schrieb Yuval Kesten:
> Hi Em,
> 1. Regarding the performances - the similarity class (And my subtype as well) gets the
IDF and TF and SQUARED SUMS calculations as inputs - they just factor them differently. Even
though I ignore the values they are being computed.
> 2. I have written this code:
>     static {
>         Similarity.setDefault(new MySimilarity());
>     }
> Which means that I am setting the default similarity before doing the indexing and obviously
before the searching.
> Thanks!
> 
> -----Original Message-----
> From: Em [mailto:mailformailinglists@yahoo.de] 
> Sent: Tuesday, February 21, 2012 6:07 PM
> To: java-user@lucene.apache.org
> Subject: Re: Custom lucene scoring - Dot product between field boost and query boost
> 
> Hi Yuval,
> 
>> 1. Performances: I am calculating all the TF/IDF stuff and NORMS for 
>> nothing...
> You aren't calculating that much, since you declared all those values as constants. What
are you worried about?
> 
>> 2. The score I get from the TopScoreDocCollector is not the same as I
> get from the Explanation.
>> Here is part of my code:
> Could you provide us the code where you are setting the Similarity, please?
> 
> Kind regards,
> Em
> 
> Am 21.02.2012 16:18, schrieb Yuval Kesten:
>> Hi,
>> I want to use Lucene with the following scoring logic:
>> When I index my documents I want to set for each field a score/weight.
>> When I query my index I want to set for each query term a score/weight.
>>
>> I will NEVER index or query with many instances of the same field - In each query
(document) there will be 0-1 instances with the same field name.
>> My fields/query term are not analyzed - they are already made out of one token.
>>
>> I want the score to be simply the dot product between the fields of the query to
the fields of the document if they have the same value.
>>
>> For example:
>> Query:
>> Field Name
>>
>> Field Value
>>
>> Field Score
>>
>> 1
>>
>> AA
>>
>> 0.1
>>
>> 7
>>
>> BB
>>
>> 0.2
>>
>> 8
>>
>> CC
>>
>> 0.3
>>
>>
>> Document 1:
>> Field Name
>>
>> Field Value
>>
>> Field Score
>>
>> 1
>>
>> AA
>>
>> 0.2
>>
>> 2
>>
>> DD
>>
>> 0.8
>>
>> 7
>>
>> CC
>>
>> 0.999
>>
>> 10
>>
>> FFF
>>
>> 0.1
>>
>>
>> Document 2:
>> Field Name
>>
>> Field Value
>>
>> Field Score
>>
>> 7
>>
>> BB
>>
>> 0.3
>>
>> 8
>>
>> CC
>>
>> 0.5
>>
>>
>> The scores should be:
>> Score(q,d1) = FIELD_1_SCORE_Q * FILED_1_SCORE_D1 = 0.1 * 0.2  = 0.02
>> Score(q,d2) = FIELD_7_SCORE_Q * FILED_7_SCORE_D2 + FIELD_8_SCORE_Q * 
>> FILED_8_SCORE_D2 = (0.2 * 0.3) + (0.3 * 0.5)
>>
>> What would be the best way implement it? In terms of accuracy and performances (I
don't need TF and IDF calculations).
>>
>> I currently implemented it by setting boosts to the fields and query terms.
>> Then I overwritten the DefaultSimilarity class:
>>
>> public class MySimilarity extends DefaultSimilarity {
>>
>>     @Override
>>     public float computeNorm(String field, FieldInvertState state) {
>>         return state.getBoost();
>>     }
>>
>>     @Override
>>     public float queryNorm(float sumOfSquaredWeights) {
>>         return 1;
>>     }
>>
>>     @Override
>>     public float tf(float freq) {
>>         return 1;
>>     }
>>
>>     @Override
>>     public float idf(int docFreq, int numDocs) {
>>         return 1;
>>     }
>>
>>     @Override
>>     public float coord(int overlap, int maxOverlap) {
>>         return 1;
>>     }
>>
>> }
>>
>> And based on http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/scoring.html
this should work.
>> Problems:
>> 1. Performances: I am calculating all the TF/IDF stuff and NORMS for nothing...
>> 2. The score I get from the TopScoreDocCollector is not the same as I get from the
Explanation.
>> Here is part of my code:
>>
>> indexSearcher = new IndexSearcher(IndexReader.open(directory, true)); 
>> TopScoreDocCollector collector = TopScoreDocCollector.create(iTopN, 
>> true); indexSearcher.search(query, collector); ScoreDoc[] hits = 
>> collector.topDocs().scoreDocs; for (int i = 0; i < hits.length; ++i) { 
>> int docId = hits[i].doc; Document d = indexSearcher.doc(docId); double 
>> score = hits[i].score; String id = d.get(FIELD_ID); Explanation 
>> explanation = indexSearcher.explain(query, docId); }
>>
>> Thanks!
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message