lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joshua Lewis <jle...@cs.umass.edu>
Subject Re: Re: Scoring without normalization!
Date Thu, 15 Jul 2004 18:53:29 GMT

Hi,

Note that as 
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#lengthNorm(java.lang.String,%20int)
points out:

"[T]hese values are computed under IndexWriter.addDocument(Document)  and
stored then using {#encodeNorm(float)}. Thus they have limited precision,
and documents must be re-indexed if this method is altered."

So you can't get rid of fieldNorm at search time, as it is stored for each
Field in the index itself; you have to reindex with a new Similarity that
returns 1 in lengthNorm using IndexWriter.setSimilarity(Similarity s).

If you want to stick to tf-idf, then you can override coord, queryNorm,
and lengthNorm, but again, lengthNorm can only be overriden at index time.

Alternately, as Anson suggested below, you can hack the source for the 
Hits class to ignore the fieldNorm stored in the index.

Hope this helps,

Josh

On 15 Jul 2004, Jones G wrote:

> Thanks. I tried overriding Similarity, returning 1 in lengthNorm and
> queryNorm and setSimilarity on IndexSearcher with this.

Query: 1 Found: 1540632
Rank: 1	ID: 8157438	Score: 0.99999994
3.73650457E11 = weight(title:iron in 159395), product of:
  7.0507255 = queryWeight(title:iron), product of:
    7.0507255 = idf(docFreq=10816)
    1.0 = queryNorm
  5.2994613E10 = fieldWeight(title:iron in 159395), product of:
    1.0 = tf(termFreq(title:iron)=1)
    7.0507255 = idf(docFreq=10816)
    7.5161928E9 = fieldNorm(field=title, doc=159395)

How do I get rid of QueryWeight, fieldWeight, fieldNorm from the scoring?

I tried modifying TermQuery without much luck.


On Thu, 15 Jul 2004 Doug Cutting wrote :
>Have you looked at:
>
>http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html
>
>in particular, at:
>
>http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#lengthNorm(java.lang.String,%20int)
>http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#queryNorm(float)
>
>Doug
>
>Jones G wrote:
>>Sadly, I am still running into problems
>>
>>Explain shows the following after the modification.
>>
>>Rank: 1     ID: 11285358    Score: 5.5740864E8
>>5.5740864E8 = product of:
>>   8.3611296E8 = sum of:
>>     8.3611296E8 = product of:
>>       6.6889037E9 = weight(title:iron in 1235940), product of:
>>         0.12621856 = queryWeight(title:iron), product of:
>>           7.0507255 = idf(docFreq=10816)
>>           0.017901499 = queryNorm
>>         5.2994613E10 = fieldWeight(title:iron in 1235940), product of:
>>           1.0 = tf(termFreq(title:iron)=1)
>>           7.0507255 = idf(docFreq=10816)
>>           7.5161928E9 = fieldNorm(field=title, doc=1235940)
>>       0.125 = coord(1/8)
>>     2.7106019E-8 = product of:
>>       1.08424075E-7 = sum of:
>>         5.7318403E-9 = weight(abstract:an in 1235940), product of:
>>           0.03711049 = queryWeight(abstract:an), product of:
>>             2.073038 = idf(docFreq=1569960)
>>             0.017901499 = queryNorm
>>           1.5445337E-7 = fieldWeight(abstract:an in 1235940), product of:
>>             1.0 = tf(termFreq(abstract:an)=1)
>>             2.073038 = idf(docFreq=1569960)
>>             7.4505806E-8 = fieldNorm(field=abstract, doc=1235940)
>>         1.0269223E-7 = weight(abstract:iron in 1235940), product of:
>>           0.111071706 = queryWeight(abstract:iron), product of:
>>             6.2046037 = idf(docFreq=25209)
>>             0.017901499 = queryNorm
>>           9.24558E-7 = fieldWeight(abstract:iron in 1235940), product of:
>>             2.0 = tf(termFreq(abstract:iron)=4)
>>             6.2046037 = idf(docFreq=25209)
>>             7.4505806E-8 = fieldNorm(field=abstract, doc=1235940)
>>       0.25 = coord(2/8)
>>   0.6666667 = coord(2/3)
>>Rank: 2     ID: 8157438     Score: 2.7870432E8
>>2.7870432E8 = product of:
>>   8.3611296E8 = product of:
>>     6.6889037E9 = weight(title:iron in 159395), product of:
>>       0.12621856 = queryWeight(title:iron), product of:
>>         7.0507255 = idf(docFreq=10816)
>>         0.017901499 = queryNorm
>>       5.2994613E10 = fieldWeight(title:iron in 159395), product of:
>>         1.0 = tf(termFreq(title:iron)=1)
>>         7.0507255 = idf(docFreq=10816)
>>         7.5161928E9 = fieldNorm(field=title, doc=159395)
>>     0.125 = coord(1/8)
>>   0.33333334 = coord(1/3)
>>Rank: 3     ID: 10543103    Score: 2.7870432E8
>>2.7870432E8 = product of:
>>   8.3611296E8 = product of:
>>     6.6889037E9 = weight(title:iron in 553967), product of:
>>       0.12621856 = queryWeight(title:iron), product of:
>>         7.0507255 = idf(docFreq=10816)
>>         0.017901499 = queryNorm
>>       5.2994613E10 = fieldWeight(title:iron in 553967), product of:
>>         1.0 = tf(termFreq(title:iron)=1)
>>         7.0507255 = idf(docFreq=10816)
>>         7.5161928E9 = fieldNorm(field=title, doc=553967)
>>     0.125 = coord(1/8)
>>   0.33333334 = coord(1/3)
>>Rank: 4     ID: 8753559     Score: 2.7870432E8
>>2.7870432E8 = product of:
>>   8.3611296E8 = product of:
>>     6.6889037E9 = weight(title:iron in 2563152), product of:
>>       0.12621856 = queryWeight(title:iron), product of:
>>         7.0507255 = idf(docFreq=10816)
>>         0.017901499 = queryNorm
>>       5.2994613E10 = fieldWeight(title:iron in 2563152), product of:
>>         1.0 = tf(termFreq(title:iron)=1)
>>         7.0507255 = idf(docFreq=10816)
>>         7.5161928E9 = fieldNorm(field=title, doc=2563152)
>>     0.125 = coord(1/8)
>>   0.33333334 = coord(1/3)
>>
>>I would like to get rid of all normalizations and just have TF and IDF.
>>What am I missing?
>>
>>
>>On Thu, 15 Jul 2004 Anson Lau wrote :
>>
>>>If you don't mind hacking the source:
>>>
>>>In Hits.java
>>>
>>>In method "getMoreDocs()"
>>>
>>>
>>>
>>>    // Comment out the following
>>>    //float scoreNorm = 1.0f;
>>>    //if (length > 0 && scoreDocs[0].score > 1.0f) {
>>>    //  scoreNorm = 1.0f / scoreDocs[0].score;
>>>    //}
>>>
>>>    // And just set scoreNorm to 1.
>>>    int scoreNorm = 1;
>>>
>>>
>>>I don't know if u can do it without going to the src.
>>>
>>>Anson
>>>
>>>
>>>-----Original Message-----
>>> From: Jones G [mailto:jones.g@rediffmail.com]
>>>Sent: Thursday, July 15, 2004 6:52 AM
>>>To: lucene-user@jakarta.apache.org
>>>Subject: Scoring without normalization!
>>>
>>>How do I remove document normalization from scoring in Lucene? I just want
>>>to stick to TF IDF.
>>>
>>>Thanks.
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>
>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message