lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "AnalyzersTokenizersTokenFilters" by mlissner
Date Thu, 04 Aug 2011 22:09:25 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "AnalyzersTokenizersTokenFilters" page has been changed by mlissner:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?action=diff&rev1=123&rev2=124

Comment:
Ran doesn't get stemmed to run. Porter ain't that smart.

  
  For a more complete list of what Tokenizers and !TokenFilters come out of the box, please
consult the [[http://lucene.apache.org/solr/api/org/apache/solr/analysis/package-summary.html|javadocs]]
for the analysis package.  if you have any tips/tricks you'd like to mention about using any
of these classes, please add them below.
  
- For information about some language-specific !Tokenizers and !TokenFilters available in
Solr, please consult LanguageAnalysis.
+ For information about some language-specific Tokenizers and !TokenFilters available in Solr,
please consult LanguageAnalysis.
  
  '''Note:''' For a good background on Lucene Analysis, it's recommended that you read the
following sections in [[http://manning.com/lucene|Lucene In Action]]:
  
@@ -24, +24 @@

  == Stemming ==
  There are four types of stemming strategies:
  
-  * [[http://tartarus.org/~martin/PorterStemmer/|Porter]] or Reduction stemming — A transforming
algorithm that reduces any of the forms of a word such as "runs, running, ran", to its elemental
root e.g., "run". Porter stemming must be performed ''both'' at insertion time and at query
time.
+  * [[http://tartarus.org/~martin/PorterStemmer/|Porter]] or Reduction stemming — A transforming
algorithm that reduces any of the forms of a word such as "walks, walking, walked", to its
elemental root e.g., "walk". Porter stemming must be performed ''both'' at insertion time
and at query time.
   * [[http://code.google.com/p/lucene-hunspell/|Lucene-Hunspell]] aims to provide features
such as stemming, decompounding, spellchecking, normalization, term expansion, etc. taking
advantage of the existing lexical resources already created and widely-used in projects like
!OpenOffice. This is still alpha-version but with an impressive list of supported languages
(See [[http://lucene-eurocon.org/sessions-track2-day2.html#5|this presentation]] for more)
   * Expansion stemming — Takes a root word and 'expands' it to all of its various forms
— can be used ''either'' at insertion time ''or'' at query time.  One way to approach this
is by using the [[#SynonymFilter|SynonymFilterFactory]]
   * [[AnalyzersTokenizersTokenFilters/Kstem|KStem]], an alternative to Porter for developers
looking for a less agressive stemmer.

Mime
View raw message