lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Relative term frequency?
Date Tue, 07 Jun 2005 07:01:39 GMT
On Monday 06 June 2005 22:59, Andy Liu wrote:
> Is there a way to calculate term frequency scores that are relative to
> the number of terms in the field of the document?  We want to override
> tf() in this way to curb keyword spamming in web pages.  In
> Similarity, only the document's term frequency is passed into the tf()
> method:
> 
> float tf(int freq)
> 
> It would be nice to have something like:
> 
> float tf(int freq, String fieldName, int numTerms)
> 
> If this isn't available out of the box, how difficult would it be to
> hack up Lucene to allow for this?

Have a look here:
http://issues.apache.org/bugzilla/show_bug.cgi?id=31784

It scores terms by density and it uses a separate table mapping
the norms stored in the index to inverse doc lengths. 
This table could be adapted as needed.
When that is not enough, it's probably a good start for what
you need.

Regards,
Paul Elschot.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message