lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: Scoring on Number of Unique Terms Hit, Not Term Frequency Counts
Date Fri, 25 May 2007 15:22:38 GMT
On 5/25/07, Walt Stoneburner <walt.stoneburner@gmail.com> wrote:
> In reading the math for scoring at the bottom of:
> http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Similarity.html
>
> It appears that if I can make tf() and idf(), term frequency and
> inverse document frequency respectively, both return 1, then coord(),
> which is now the primary factor of the product, is what I'm looking
> for.

Pretty close, I think.  There is still the length normalization factor
that biases short fields over long.  That's calculated at index time,
and stored in the "norm" along with the boost (they are multiplied
together).

You can change the similarity during indexing, or you can completely
knock out norms via Field.setOmitNorms(true)

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message