lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andreas Owen" <ao...@swissonline.ch>
Subject Re: ngramfilter minGramSize problem
Date Sun, 06 Apr 2014 22:05:20 GMT
i thought i cound use <filter class="solr.LengthFilterFactory" min="1"  
max="2"/> to index and search words that are only 1 or 2 chars long. it  
seems to work but i have to test it some more


On Sun, 06 Apr 2014 22:24:20 +0200, Andreas Owen <aowen@swissonline.ch>  
wrote:

> i have the a fieldtype that uses ngramfilter whle indexing. is there a  
> setting that can force the ngramfilter to index smaller words then the  
> minGramSize? Mine is set to 3 and the search wont find word that are  
> only 1 or 2 chars long. i would like to not set minGramSize=1 because  
> the results would be to diverse.
>
> fieldtype:
>
> <fieldType name="text_de" class="solr.TextField"  
> positionIncrementGap="100">
>        <analyzer type="index">
>          <tokenizer class="solr.StandardTokenizerFactory"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
> 		<!-- <filter class="solr.WordDelimiterFilterFactory"  
> types="at-under-alpha.txt"/> -->
> 		<filter class="solr.StopFilterFactory" ignoreCase="true"  
> words="lang/stopwords_de.txt" format="snowball"  
> enablePositionIncrements="true"/> <!-- remove common words -->
>          <filter class="solr.GermanNormalizationFilterFactory"/>
> 		<filter class="solr.SnowballPorterFilterFactory" language="German"/>  
> <!-- remove noun/adjective inflections like plural endings -->
> 		<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"  
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"  
> catenateAll="0" splitOnCaseChange="1"/>
> 		<filter class="solr.NGramFilterFactory" minGramSize="3"  
> maxGramSize="50"/>
>
> 	   </analyzer>
> 	   <analyzer type="query">
> 			<tokenizer class="solr.WhiteSpaceTokenizerFactory"/>
> 			<filter class="solr.LowerCaseFilterFactory"/>
> 			<filter class="solr.StopFilterFactory" ignoreCase="true"  
> words="lang/stopwords_de.txt" format="snowball"  
> enablePositionIncrements="true"/> <!-- remove common words -->
> 			<filter class="solr.GermanNormalizationFilterFactory"/>
> 			<filter class="solr.SnowballPorterFilterFactory" language="German"/>
> 			<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"  
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"  
> catenateAll="0" splitOnCaseChange="1"/>
>        </analyzer>
>      </fieldType>


-- 
Using Opera's mail client: http://www.opera.com/mail/

Mime
View raw message