lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "MH H" <>
Subject Re: Overriding Similarity
Date Sat, 19 Aug 2006 07:52:58 GMT
I had a situation where I was only interested in whether the term was
there or not (not how many times), and I didn't want to penalize long
fields. So I wrote a Similariy subclass where I overrided the
following methods as this:

   public float lengthNorm(String fieldName, int numTerms) {
      return numTerms > 0 ? 1.0f : 0.0f;
   public float tf(float freq) {
      return freq > 0 ? 1.0f : 0.0f;

And then I made this subclass the default similarity. It worked well
for tf but not for lengthNorm. The reason appears to be that the
TermScorer class does not call lengthNorm, but instead uses a cache
implemented as an static array in Similarity, made available through
static methods in Similarity. Since TermScorer calls these static
methods in Similarity, changing the default similarity has no effect
in this regard. So I ended up having to customize the code of core
lucene by changing the following code in Similarity:

   static {
      for (int i = 0; i < 256; i++)
         NORM_TABLE[i] = 1.0f; //Originally: NORM_TABLE[i] =

This worked well, but I had hoped not having to change core lucene, so
if anyone has any other/better solution, I would appreciate some tips.


> : I am doing some documentation on scoring and I am interested in use
> : cases people have for overriding the DefaultSimilarity.  If you can
> : share what you did and why you did it, it would be much appreciated.
> I touched on this a little bit when i commited SweetSpotSimilarity...
> ...really any situation where you know more about your data then just that
> it's "text" is a situation where it *might* make sense to to override your
> SImilarity method.
> -Hoss
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message