lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jones G" <jone...@rediffmail.com>
Subject Re: Re: Scoring without normalization!
Date Thu, 15 Jul 2004 18:43:18 GMT
Thanks. I tried overriding Similarity, returning 1 in lengthNorm and queryNorm and setSimilarity
on IndexSearcher with this.

Query: 1 Found: 1540632
Rank: 1	ID: 8157438	Score: 0.99999994
3.73650457E11 = weight(title:iron in 159395), product of:
  7.0507255 = queryWeight(title:iron), product of:
    7.0507255 = idf(docFreq=10816)
    1.0 = queryNorm
  5.2994613E10 = fieldWeight(title:iron in 159395), product of:
    1.0 = tf(termFreq(title:iron)=1)
    7.0507255 = idf(docFreq=10816)
    7.5161928E9 = fieldNorm(field=title, doc=159395)

How do I get rid of QueryWeight, fieldWeight, fieldNorm from the scoring?

I tried modifying TermQuery without much luck.


On Thu, 15 Jul 2004 Doug Cutting wrote :
>Have you looked at:
>
>http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html
>
>in particular, at:
>
>http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#lengthNorm(java.lang.String,%20int)
>http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#queryNorm(float)
>
>Doug
>
>Jones G wrote:
>>Sadly, I am still running into problems
>>
>>Explain shows the following after the modification.
>>
>>Rank: 1     ID: 11285358    Score: 5.5740864E8
>>5.5740864E8 = product of:
>>   8.3611296E8 = sum of:
>>     8.3611296E8 = product of:
>>       6.6889037E9 = weight(title:iron in 1235940), product of:
>>         0.12621856 = queryWeight(title:iron), product of:
>>           7.0507255 = idf(docFreq=10816)
>>           0.017901499 = queryNorm
>>         5.2994613E10 = fieldWeight(title:iron in 1235940), product of:
>>           1.0 = tf(termFreq(title:iron)=1)
>>           7.0507255 = idf(docFreq=10816)
>>           7.5161928E9 = fieldNorm(field=title, doc=1235940)
>>       0.125 = coord(1/8)
>>     2.7106019E-8 = product of:
>>       1.08424075E-7 = sum of:
>>         5.7318403E-9 = weight(abstract:an in 1235940), product of:
>>           0.03711049 = queryWeight(abstract:an), product of:
>>             2.073038 = idf(docFreq=1569960)
>>             0.017901499 = queryNorm
>>           1.5445337E-7 = fieldWeight(abstract:an in 1235940), product of:
>>             1.0 = tf(termFreq(abstract:an)=1)
>>             2.073038 = idf(docFreq=1569960)
>>             7.4505806E-8 = fieldNorm(field=abstract, doc=1235940)
>>         1.0269223E-7 = weight(abstract:iron in 1235940), product of:
>>           0.111071706 = queryWeight(abstract:iron), product of:
>>             6.2046037 = idf(docFreq=25209)
>>             0.017901499 = queryNorm
>>           9.24558E-7 = fieldWeight(abstract:iron in 1235940), product of:
>>             2.0 = tf(termFreq(abstract:iron)=4)
>>             6.2046037 = idf(docFreq=25209)
>>             7.4505806E-8 = fieldNorm(field=abstract, doc=1235940)
>>       0.25 = coord(2/8)
>>   0.6666667 = coord(2/3)
>>Rank: 2     ID: 8157438     Score: 2.7870432E8
>>2.7870432E8 = product of:
>>   8.3611296E8 = product of:
>>     6.6889037E9 = weight(title:iron in 159395), product of:
>>       0.12621856 = queryWeight(title:iron), product of:
>>         7.0507255 = idf(docFreq=10816)
>>         0.017901499 = queryNorm
>>       5.2994613E10 = fieldWeight(title:iron in 159395), product of:
>>         1.0 = tf(termFreq(title:iron)=1)
>>         7.0507255 = idf(docFreq=10816)
>>         7.5161928E9 = fieldNorm(field=title, doc=159395)
>>     0.125 = coord(1/8)
>>   0.33333334 = coord(1/3)
>>Rank: 3     ID: 10543103    Score: 2.7870432E8
>>2.7870432E8 = product of:
>>   8.3611296E8 = product of:
>>     6.6889037E9 = weight(title:iron in 553967), product of:
>>       0.12621856 = queryWeight(title:iron), product of:
>>         7.0507255 = idf(docFreq=10816)
>>         0.017901499 = queryNorm
>>       5.2994613E10 = fieldWeight(title:iron in 553967), product of:
>>         1.0 = tf(termFreq(title:iron)=1)
>>         7.0507255 = idf(docFreq=10816)
>>         7.5161928E9 = fieldNorm(field=title, doc=553967)
>>     0.125 = coord(1/8)
>>   0.33333334 = coord(1/3)
>>Rank: 4     ID: 8753559     Score: 2.7870432E8
>>2.7870432E8 = product of:
>>   8.3611296E8 = product of:
>>     6.6889037E9 = weight(title:iron in 2563152), product of:
>>       0.12621856 = queryWeight(title:iron), product of:
>>         7.0507255 = idf(docFreq=10816)
>>         0.017901499 = queryNorm
>>       5.2994613E10 = fieldWeight(title:iron in 2563152), product of:
>>         1.0 = tf(termFreq(title:iron)=1)
>>         7.0507255 = idf(docFreq=10816)
>>         7.5161928E9 = fieldNorm(field=title, doc=2563152)
>>     0.125 = coord(1/8)
>>   0.33333334 = coord(1/3)
>>
>>I would like to get rid of all normalizations and just have TF and IDF.
>>What am I missing?
>>
>>
>>On Thu, 15 Jul 2004 Anson Lau wrote :
>>
>>>If you don't mind hacking the source:
>>>
>>>In Hits.java
>>>
>>>In method "getMoreDocs()"
>>>
>>>
>>>
>>>    // Comment out the following
>>>    //float scoreNorm = 1.0f;
>>>    //if (length > 0 && scoreDocs[0].score > 1.0f) {
>>>    //  scoreNorm = 1.0f / scoreDocs[0].score;
>>>    //}
>>>
>>>    // And just set scoreNorm to 1.
>>>    int scoreNorm = 1;
>>>
>>>
>>>I don't know if u can do it without going to the src.
>>>
>>>Anson
>>>
>>>
>>>-----Original Message-----
>>> From: Jones G [mailto:jones.g@rediffmail.com]
>>>Sent: Thursday, July 15, 2004 6:52 AM
>>>To: lucene-user@jakarta.apache.org
>>>Subject: Scoring without normalization!
>>>
>>>How do I remove document normalization from scoring in Lucene? I just want
>>>to stick to TF IDF.
>>>
>>>Thanks.
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>
>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message