lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "HunspellStemFilterFactory" by JanHoydahl
Date Sun, 25 Sep 2011 16:27:20 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "HunspellStemFilterFactory" page has been changed by JanHoydahl:
http://wiki.apache.org/solr/HunspellStemFilterFactory?action=diff&rev1=2&rev2=3

Comment:
Added note about varying quality

  
  The {{{dictionary}}} argument optionally takes a comma-separated list of dictionaries, in
which case all will be loaded, in the order specified. This lets you maintain your own custom
additions without needing to edit the originals. We encourage your to contribute your changes/additions
back to the maintainers of the [[http://wiki.services.openoffice.org/wiki/Dictionaries|original
dictionaries]].
  
- An example of how Hunspell is more accurate than Snowball, from Norwegian:
+ An example of how Hunspell may be more accurate than the Snowball stemmer, from Norwegian:
  {{{
                bil (car)    biler (cars)   billig (cheap)   billige           billigere (cheaper)
  Snowball      bil          bil            bil (car)        bil               billiger (N/A)
@@ -18, +18 @@

                             bile (drive)                    billige (pl)      billige (pl)
  }}}
  
+ <!> Note that Hunspell's suitability for stemming purposes will vary depending on
the quality of the dictionaries and affix files. Always test the quality of various stemmers
before deciding on which to choose for your language. Another potential disadvantage with
a dictionary based stemmer is that it only works for words listed in the dictionary, so be
prepared to invest some time in adding new or domain specific vocabulary to the dictionaries.
+ 

Mime
View raw message