lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: term frequency normalization
Date Fri, 13 Feb 2009 01:31:37 GMT

: The easiest way to change the tf calculation would be overwriting
: tf in an own implementation of Similarity like it's done in
: SweetSpotSimilarity. But the average term frequency of the
: document is missing. Is there a simple way to get or calc this
: number?

there was quite a bit of discussion about this in the archive ... the 
short answer is no: Lucene doesn't store that stat.  the long answer is 
that you could compute that stat, either while indexing your documents and 
saving it somewhere your Similarity class knows to look for it, or by 
walking all fo the TermFreqVectors when opening an index reader, and then 
set that property on your CustomSimilarity instance before executing any 

the hope is that in the future "Flexible indexing" (which i only vaguely 
understand) will make it easier to record/manage stats like this while 
indexing and have Lucene keep track of them for you as part of hte index 


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message