lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <paul_t...@fastmail.fm>
Subject Re: What replaces the computeNorm method in DefaultSimilarity in 4.1 now that the method is final
Date Tue, 26 Feb 2013 10:47:33 GMT
On 19/02/2013 11:42, Paul Taylor wrote:
> What replaces the computeNorm method in DefaultSimilarity in 4.1
>
> Ive always subclassed DefaultSimilarity to resolve an issue whereby 
> when document has multiple values in a field (because has one-many 
> relationship) its score worse then a document which just has single 
> value  but the computeNorm()
> method has gone, but when I tried to rewrite the method for 4.1 as 
> follows
>
> public void  computeNorm(FieldInvertState state, Norm norm) {
>
>         if (state.getName().equals("alias")) {
>             if(state.getLength()>=3) {
>                 norm.setFloat(state.getBoost() * 0.578f);
>             }
>             else {
>                 super.computeNorm(state, norm);
>             }
>         }
>         else {
>             super.computeNorm(state, norm);
>         }
>     }
>
>
>
> I found it was final so what should I do.
>
>
> 3.6 Code:
>
> package org.musicbrainz.search.analysis;
>
> import org.apache.lucene.index.FieldInvertState;
> import org.apache.lucene.search.similarities.DefaultSimilarity;
>
> /**
>  * Calculates a score for a match, overridden to deal with problems 
> with alias fields in artist and label indexes
>  */
> public class MusicbrainzSimilarity extends DefaultSimilarity
> {
>
>     /**
>      * Calculates a value which is inversely proportional to the 
> number of terms in the field. When multiple
>      * aliases are added to an artist (or label) it is seen as one 
> field, so artists with many aliases can be
>      * disadvantaged against when the matching alias is radically 
> different to other aliases.
>      *
>      * @return score component
>      */
>     public float computeNorm(String field, FieldInvertState state) {
>
>         //This will match both artist and label aliases and is 
> applicable to both, didn't use the constant
>         //ArtistIndexField.ALIAS because that would be confusing
>         if (field.equals("alias")) {
>             if(state.getLength()>=3)
>             {
>                 return state.getBoost() * 0.578f; //Same result as 
> normal calc if field had three terms the most common scenario
>             }
>             else {
>                 return super.computeNorm(field,state);
>             }
>         }
>         else
>         {
>             return super.computeNorm(field,state);
>         }
>     }
>
>
>     /**
>      * This method calculates a value based on how many times the 
> search term was found in the field. Because
>      * we have only short fields the only real case (apart from rare 
> exceptions like Duran Duran Duran) whereby
>      * the term term is found more than twice would be when
>      * a search term matches multiples aliases, to remove the bias 
> this gives towards artists/labels with
>      * many aliases we limit the value to what would be returned for a 
> two term match.
>      *
>      * Note: would prefer to do this just for alias field, but the 
> field is not passed as a parameter.
>      * @param freq
>      * @return score component
>      */
>     @Override
>     public float tf(float freq) {
>         if (freq > 2.0f) {
>             return 1.41f; //Same result as if matched term twice
>
>         } else {
>             return super.tf(freq);
>         }
>     }
> }
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Found it, need to override lengthNorm instead

@Override
     public float lengthNorm(FieldInvertState state) {

         if (state.getName().equals("alias"))
         {
             if(state.getLength()>=3) {
                 return state.getBoost() * 0.578f; //Same result as 
normal calc if field had three terms the most common scenario
             }
             else
             {
                 return super.lengthNorm(state);
             }
         }
         else
         {
             return super.lengthNorm(state);
         }
     }

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message