lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "AnalyzersTokenizersTokenFilters" by RobertMuir
Date Fri, 05 Feb 2010 15:24:08 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "AnalyzersTokenizersTokenFilters" page has been changed by RobertMuir.
The comment on this change is: beef up / disambiguate the snowball docs.
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?action=diff&rev1=73&rev2=74

--------------------------------------------------

  
    Example: "riding", "rides", "horses" ==> "ride", "ride", "hors".
  
+ Note: This differs very slightly from the "Porter" algorithm available in `solr.SnowballPorterFilter`,
in that it deviates slightly from the published algorithm.
+ For more details, see the section "Points of difference from the published algorithm" described
[[http://tartarus.org/~martin/PorterStemmer/|here]].
+ 
  <<Anchor(EnglishPorterFilter)>>
  ==== solr.EnglishPorterFilterFactory ====
  
@@ -347, +350 @@

  
  Creates `org.apache.lucene.analysis.SnowballPorterFilter`.
  
- Creates an [[http://snowball.tartarus.org/algorithms/english/stemmer.html|Porter2 stemmer]]
from the Java classes generated from a [[http://snowball.tartarus.org/|Snowball]] specification.
 The language attribute is used to specify the language of the stemmer.
+ Creates an [[http://snowball.tartarus.org/texts/stemmersoverview.html|Snowball stemmer]]
from the Java classes generated from a [[http://snowball.tartarus.org/|Snowball]] specification.
 The language attribute is used to specify the language of the stemmer.
  {{{
  <fieldtype name="myfieldtype" class="solr.TextField">
    <analyzer>
@@ -358, +361 @@

  }}}
  
  Valid values for the language attribute (creates the snowball stemmer class language + "Stemmer"):
-  * Danish
-  * Dutch
-  * English
-  * Finnish
-  * French
-  * German2
-  * German
-  * Italian
-  * Kp
-  * Lovins
-  * Norwegian
-  * Porter
-  * Portuguese
-  * Russian
-  * Spanish
-  * Swedish
+  * [[http://snowball.tartarus.org/algorithms/danish/stemmer.html|Danish]]
+  * [[http://snowball.tartarus.org/algorithms/dutch/stemmer.html|Dutch]]
+  * [[http://snowball.tartarus.org/algorithms/kraaij_pohlmann/stemmer.html|Kp]]: The Kraaij-Pohlmann
stemming algorithm for Dutch.
+  * [[http://snowball.tartarus.org/algorithms/porter/stemmer.html|Porter]]: The original
Porter stemming algorithm for English.
+  * [[http://snowball.tartarus.org/algorithms/english/stemmer.html|English]]: The Porter2
stemming algorithm for English.
+  * [[http://snowball.tartarus.org/algorithms/lovins/stemmer.html|Lovins]]: The early Lovins
stemming algorithm for English.
+  * [[http://snowball.tartarus.org/algorithms/finnish/stemmer.html|Finnish]]
+  * [[http://snowball.tartarus.org/algorithms/french/stemmer.html|French]]
+  * [[http://snowball.tartarus.org/algorithms/german/stemmer.html|German]]
+  * [[http://snowball.tartarus.org/algorithms/german2/stemmer.html|German2]]: A variation
of the German algorithm with handling to allow ä, ö and ü to be represented by ae, oe and
ue
+  * [[http://snowball.tartarus.org/algorithms/italian/stemmer.html|Italian]]
+  * [[http://snowball.tartarus.org/algorithms/norwegian/stemmer.html|Norwegian]]
+  * [[http://snowball.tartarus.org/algorithms/portuguese/stemmer.html|Portuguese]]
+  * [[http://snowball.tartarus.org/algorithms/russian/stemmer.html|Russian]]
+  * [[http://snowball.tartarus.org/algorithms/spanish/stemmer.html|Spanish]]
+  * [[http://snowball.tartarus.org/algorithms/swedish/stemmer.html|Swedish]]
  
+ <!> Gotchas:
+  * Although the Lovins stemmer is described as faster than Porter/Porter2, practically it
is much slower in Solr, as it is implemented using reflection.
+  * Neither the Lovins nor the Finnish stemmer produce correct output (as of Solr 1.4), due
to a [[http://article.gmane.org/gmane.comp.search.snowball/1139|known bug in Snowball]]
  
  <<Anchor(WordDelimiterFilter)>>
  ==== solr.WordDelimiterFilterFactory ====

Mime
View raw message