lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Furkan KAMACI <furkankam...@gmail.com>
Subject Re: ngramfilter minGramSize problem
Date Sun, 06 Apr 2014 21:02:17 GMT
Hi Andreas;

I've implemented a similar feature into EdgeNgramFilter due to some Solr
users wants it. My patch is here:
https://issues.apache.org/jira/browse/SOLR-5332 However if you read the
conversation below the issue you will realize that you can do it with
another way.

Thanks;
Furkan KAMACI


2014-04-06 23:24 GMT+03:00 Andreas Owen <aowen@swissonline.ch>:

> i have the a fieldtype that uses ngramfilter whle indexing. is there a
> setting that can force the ngramfilter to index smaller words then the
> minGramSize? Mine is set to 3 and the search wont find word that are only 1
> or 2 chars long. i would like to not set minGramSize=1 because the results
> would be to diverse.
>
> fieldtype:
>
> <fieldType name="text_de" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>                 <!-- <filter class="solr.WordDelimiterFilterFactory"
> types="at-under-alpha.txt"/> -->
>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_de.txt" format="snowball" enablePositionIncrements="true"/>
> <!-- remove common words -->
>         <filter class="solr.GermanNormalizationFilterFactory"/>
>                 <filter class="solr.SnowballPorterFilterFactory"
> language="German"/> <!-- remove noun/adjective inflections like plural
> endings -->
>                 <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>                 <filter class="solr.NGramFilterFactory" minGramSize="3"
> maxGramSize="50"/>
>
>            </analyzer>
>            <analyzer type="query">
>                         <tokenizer class="solr.
> WhiteSpaceTokenizerFactory"/>
>                         <filter class="solr.LowerCaseFilterFactory"/>
>                         <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="lang/stopwords_de.txt" format="snowball"
> enablePositionIncrements="true"/> <!-- remove common words -->
>                         <filter class="solr.GermanNormalizationFilterFacto
> ry"/>
>                         <filter class="solr.SnowballPorterFilterFactory"
> language="German"/>
>                         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>       </analyzer>
>     </fieldType>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message