lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Solr Wiki] Update of "AnalyzersTokenizersTokenFilters" by JanHoydahl
Date Fri, 18 Jun 2010 10:14:24 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "AnalyzersTokenizersTokenFilters" page has been changed by JanHoydahl.


  == Stemming ==
- There are three types of stemming strategies:
+ There are four types of stemming strategies:
     * [[|Porter]] or Reduction stemming &#151;
A transforming algorithm that reduces any of the forms of a word such as "runs, running, ran",
to its elemental root e.g., "run". Porter stemming must be performed ''both'' at insertion
time and at query time.
+    * [[|Lucene-Hunspell]] aims to provide features
such as stemming, decompounding, spellchecking, normalization, term expansion, etc. taking
advantage of the existing lexical resources already created and widely-used in projects like
OpenOffice. This is still alpha-version but with an impressive list of supported languages
(See [[|this presentation]] for more)
     * Expansion stemming &#151; Takes a root word and 'expands' it to all of its various
forms &#151; can be used ''either'' at insertion time ''or'' at query time.  One way to
approach this is by using the [[#SynonymFilter|SynonymFilterFactory]]
     * [[/Kstem|KStem]], an alternative to Porter for developers looking for a less agressive

View raw message