lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "AnalyzersTokenizersTokenFilters" by JackKrupansky
Date Wed, 04 Jul 2012 22:15:53 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "AnalyzersTokenizersTokenFilters" page has been changed by JackKrupansky:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?action=diff&rev1=133&rev2=134

  
  Nigerian => "ni", "nig", "nige", "niger", "nigeri", "nigeria", "nigeria", "nigerian"
  
- By default, minGramSize is 1, maxGramSize is 1 and side is "front". You can also set side
to generate the ngrams from right to left by setting "side" to a value of "back"
+ By default, minGramSize is 1, maxGramSize is 1 and side is "front". You can also set side
to "back" to generate the ngrams from right to left.
  
- minGramSize - the minimum number of characters to start with. For example, minGramSize=4
would mean that a word like '''Apache''' => "Apac", "Apach", "Apache" would be the 3 tokens
stored.
+ minGramSize - the minimum number of characters to start with. For example, minGramSize=4
would mean that a word like '''Apache''' => "Apac", "Apach", "Apache" would be the 3 tokens
output.
  
- This !FilterFactory is very useful in matching substrings of particular terms in the index
during query time.
+ This !FilterFactory is very useful in matching prefix substrings (or suffix substrings if
side="back") of particular terms in the index during query time. Edge n-gram analysis can
be performed at either index or query time (or both), but typically it is more useful, as
shown in this example, to generate the n-grams at index time with all of the n-grams indexed
at the same position. At query time the query term can be matched directly without any n-gram
analysis. Unlike wildcards, n-gram query terms can be used within quoted phrases.
  
  {{{
- <fieldtype name="testedgengrams" class="solr.TextField">
-    <analyzer>
+ <fieldType name="text_general_edge_ngram" class="solr.TextField" positionIncrementGap="100">
+    <analyzer type="index">
-      <tokenizer class="solr.LowerCaseTokenizerFactory"/>
+       <tokenizer class="solr.LowerCaseTokenizerFactory"/>
-      <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
+       <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
     </analyzer>
+    <analyzer type="query">
+       <tokenizer class="solr.LowerCaseTokenizerFactory"/>
+    </analyzer>
- </fieldtype>
+ </fieldType>
  }}}
  <<Anchor(KeepWordFilter)>>
  

Mime
View raw message