lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Maximum Term Frequency and Minimum Document Length
Date Thu, 05 Feb 2009 12:08:56 GMT
In schema.xml, at the very bottom, you should see:

   <!--
   <similarity class="com.example.solr.CustomSimilarityFactory">
     <str name="paramkey">param value</str>
   </similarity>
   -->

I believe creating the Factory wrapper is pretty simple.  See http://wiki.apache.org/solr/SolrPlugins

On Feb 4, 2009, at 7:29 PM, Jonah Schwartz wrote:

> We want to configure solr so that fields are indexed with a maximum  
> term
> frequency and a minimum document length. If a term appears more than  
> N times
> in a field it will be considered to have appeared only N times. If a
> document length is under M terms, it will be considered to exactly M  
> terms.
> We have done this in the past in raw Lucene by writing a Similarity  
> class
> like this:
>
> public class LimitingSimilarity extends DefaultSimilarity {
>   public float lengthNorm(String fieldName, int numTerms) {
>       return super.lengthNorm(fieldName, Math.max(minNumTerms,  
> numTerms));
>   }
>   public float tf(float freq) {
>       freq = Math.min(maxTermFrequency,freq);
>       return super.tf(freq);
>   }
> }
>
>
> Is there a better way to this within solr configuration files?
>
> Thanks,
> Jonah

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ












Mime
View raw message