lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonah Schwartz <jonah...@gmail.com>
Subject Re: Maximum Term Frequency and Minimum Document Length
Date Thu, 05 Feb 2009 19:51:17 GMT
That seems to work. Thanks!
-Jonah

On Thu, Feb 5, 2009 at 4:08 AM, Grant Ingersoll <gsingers@apache.org> wrote:

> In schema.xml, at the very bottom, you should see:
>
>  <!--
>  <similarity class="com.example.solr.CustomSimilarityFactory">
>    <str name="paramkey">param value</str>
>  </similarity>
>  -->
>
> I believe creating the Factory wrapper is pretty simple.  See
> http://wiki.apache.org/solr/SolrPlugins
>
>
> On Feb 4, 2009, at 7:29 PM, Jonah Schwartz wrote:
>
>  We want to configure solr so that fields are indexed with a maximum term
>> frequency and a minimum document length. If a term appears more than N
>> times
>> in a field it will be considered to have appeared only N times. If a
>> document length is under M terms, it will be considered to exactly M
>> terms.
>> We have done this in the past in raw Lucene by writing a Similarity class
>> like this:
>>
>> public class LimitingSimilarity extends DefaultSimilarity {
>>  public float lengthNorm(String fieldName, int numTerms) {
>>      return super.lengthNorm(fieldName, Math.max(minNumTerms, numTerms));
>>  }
>>  public float tf(float freq) {
>>      freq = Math.min(maxTermFrequency,freq);
>>      return super.tf(freq);
>>  }
>> }
>>
>>
>> Is there a better way to this within solr configuration files?
>>
>> Thanks,
>> Jonah
>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message