lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: SweetSpotSimiliarity
Date Wed, 24 May 2006 22:14:08 GMT

On May 24, 2006, at 2:12 PM, Chris Hostetter wrote:

> : Adding a fieldName argument to would add
> : significant overhead, since it gets called a *lot*.
> ...why would having the extra param add signifianct overhead? ...  
> or is
> the point just that if someone wants customized tf based on field  
> name, it
> would be better to make that choice once when starting to score a  
> query,
> (since the choice is going to be the same for all docs) rather then
> everytime tf is called becuase the way a user chooses *might*  
> involve a
> lot of overhead?

I suppose that's true that the default wouldn't suffer much if at all  
-- it'd just ignore the fieldName param.

   public float tf(float freq, String fieldName) {
     return tf(freq);

However, if you wanted to override that behavior, you'd have to apply  
at least one conditional for each doc that the Scorer plows through.

   public float tf(float freq, String fieldName) {
     if ("title".equals(fieldName))
       return 1.0f;
       return tf(freq);

That's going to be less efficient than overriding that method in an  
alternative Similarity instance for the field "title" and retrieving  
it once.   You never know how much until you benchmark it, of course.

> Should a similar change be made to
> IndexWriter, and replace Similarity.lengthNorm(String,int) with
> lengthNorm(int) ?

I like it.  <evilgrin> That's one step closer towards assigning each  
Field a pluggable, comprehensive codec. </evilgrin>

Marvin Humphrey
Rectangular Research

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message