lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: Relative term frequency?
Date Tue, 07 Jun 2005 07:01:39 GMT
On Monday 06 June 2005 22:59, Andy Liu wrote:
> Is there a way to calculate term frequency scores that are relative to
> the number of terms in the field of the document?  We want to override
> tf() in this way to curb keyword spamming in web pages.  In
> Similarity, only the document's term frequency is passed into the tf()
> method:
> float tf(int freq)
> It would be nice to have something like:
> float tf(int freq, String fieldName, int numTerms)
> If this isn't available out of the box, how difficult would it be to
> hack up Lucene to allow for this?

Have a look here:

It scores terms by density and it uses a separate table mapping
the norms stored in the index to inverse doc lengths. 
This table could be adapted as needed.
When that is not enough, it's probably a good start for what
you need.

Paul Elschot.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message