On Monday 06 June 2005 22:59, Andy Liu wrote:
> Is there a way to calculate term frequency scores that are relative to
> the number of terms in the field of the document? We want to override
> tf() in this way to curb keyword spamming in web pages. In
> Similarity, only the document's term frequency is passed into the tf()
> method:
>
> float tf(int freq)
>
> It would be nice to have something like:
>
> float tf(int freq, String fieldName, int numTerms)
>
> If this isn't available out of the box, how difficult would it be to
> hack up Lucene to allow for this?
Have a look here:
http://issues.apache.org/bugzilla/show_bug.cgi?id=31784
It scores terms by density and it uses a separate table mapping
the norms stored in the index to inverse doc lengths.
This table could be adapted as needed.
When that is not enough, it's probably a good start for what
you need.
Regards,
Paul Elschot.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
|