lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "AnalyzersTokenizersTokenFilters" by JasonRutherglen
Date Tue, 19 Jan 2010 16:05:57 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "AnalyzersTokenizersTokenFilters" page has been changed by JasonRutherglen.
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?action=diff&rev1=69&rev2=70

--------------------------------------------------

  
  Creates `org.apache.solr.analysis.CommonGramsFilter`. <!> [[Solr1.4]]
  
- Makes shingles (i.e. the_cat) by combining common tokens (usually the same as the stop words
list) and regular tokens.  CommonGramsFilter is useful for issuing phrase queries (i.e. "the
cat") that contain stop words.  Normally phrases contaning stop words would not match their
intended target and instead, the query "the cat" would match all documents containing "cat",
which can be undesirable behavior.  Phrase query slop (eg, "the cat"~2) will not match any
documents because common grams are indexed as shingled tokens that are adjacent to each other
(i.e. the_cat is indexed as a single term).
+ Makes shingles (i.e. the_cat) by combining common tokens (usually the same as the stop words
list) and regular tokens.  CommonGramsFilter is useful for issuing phrase queries (i.e. "the
cat") that contain stop words.  Normally phrases containing stop words would not match their
intended target and instead, the query "the cat" would match all documents containing "cat",
which can be undesirable behavior.  Phrase query slop (eg, "the cat"~2) will not function
as intended because common grams are indexed as shingled tokens that are adjacent to each
other (i.e. the_cat is indexed as a single term).  The CommonGramsQueryFilter converts the
phrase query "the cat" into the single term query the_cat.
  
  A customized common word list may be specified with the "words" attribute in the schema.
  Optionally, the "ignoreCase" attribute may be used to ignore the case of tokens when comparing
to the common words list.

Mime
View raw message