lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "AnalyzersTokenizersTokenFilters" by HossMan
Date Wed, 08 Dec 2010 20:35:33 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "AnalyzersTokenizersTokenFilters" page has been changed by HossMan.
The comment on this change is: some notes on when to use CharFilter curtesy of rmuir.
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?action=diff&rev1=96&rev2=97

--------------------------------------------------

    }
  }
  }}}
+ 
+ == When To use a CharFilter vs a TokenFilter ==
+ 
+ There are several pairs of !CharFilters and !TokenFilters that have related (ie: !MappingCharFilter
+  and !ASCIIFoldingFilter) or nearly identical functionality (ie: !PatternReplaceCharFilterFactory
and !PatternReplaceFilterFactory) and it may not always be obvious which is the best choice.
+ 
+ The ultimate decision depends largely on what Tokenizer you are using, and whether you need
to "out smart" it by preprocessing the stream of characters.
+ 
+ For example, maybe you have a tokenizer such as !StandardTokenizer and you are pretty happy
with how it works overall, but you want to customize how some specific characters behave.
+ 
+ In such a situation you could modify the rules and re-build your own tokenizer with javacc,
but perhaps its easier to simply map some of the characters before tokenization with a !CharFilter.
  
  = Notes On Specific Factories =
  == CharFilterFactories ==

Mime
View raw message