lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "AnalyzersTokenizersTokenFilters" by JasonRutherglen
Date Tue, 19 Jan 2010 15:56:01 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "AnalyzersTokenizersTokenFilters" page has been changed by JasonRutherglen.
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?action=diff&rev1=67&rev2=68

--------------------------------------------------

  </fieldtype>
  }}}
  
+ <<Anchor(CommonGramsFilter)>>
+ ==== solr.CommonGramsFilterFactory ====
+ 
+ Creates `org.apache.solr.analysis.CommonGramsFilter`.
+ 
+ Makes shingles (i.e. the_cat) by combining common tokens (usually the same as the stop words
list) and regular tokens.  CommonGramsFilter is useful for issuing phrase queries (i.e. "the
cat") that contain stop words.  Normally phrases contaning stop words would not match their
intended target and instead, the query "the cat" would match all documents containing "cat",
which can be undesirable behavior.  Phrase query slop (eg, "the cat"~2) will not match any
documents because common grams are indexed as shingled tokens that are adjacent to each other
(i.e. the_cat is indexed as a single term).
+ 
+ A customized common word list may be specified with the "words" attribute in the schema.
+ Optionally, the "ignoreCase" attribute may be used to ignore the case of tokens when comparing
to the common words list.
+ 
+ {{{
+ <fieldtype name="testcommongrams" class="solr.TextField">
+    <analyzer>
+      <tokenizer class="solr.LowerCaseTokenizerFactory"/>
+      <filter class="solr.CommonGramsFilterFactory" words="stopwords.txt" ignoreCase="true"/>
+    </analyzer>
+ </fieldtype>
+ }}}
  
  <<Anchor(KeepWordFilter)>>
  ==== solr.KeepWordFilterFactory ====

Mime
View raw message