lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Solr Wiki] Update of "LanguageAnalysis" by RobertMuir
Date Fri, 09 Jul 2010 15:45:06 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "LanguageAnalysis" page has been changed by RobertMuir.
The comment on this change is: add mention for hyphenation-based decompounding.


  Solr provides dictionary-based decompounding support via solr.DictionaryCompoundWordTokenFilterFactory.
This factory allows you to provide a dictionary, along with some settings (min/max subword
size, etc), to break compound words into pieces.
+ <!> [[Solr3.1]]
+ Additionally, you can use solr.HyphenationCompoundWordTokenFilterFactory. This factory uses
a hyphenation grammar in combination with an optional dictionary to break compound words into
pieces. Hyphenation grammars for a few languages can be found at the [[|FOP
XML Hyphenation Patterns]] site.
  One alternative is to use n-gram tokenization so that the search is less sensitive to compound
- TODO: Add support for Lucene's hyphenation grammar-based decompounding and document it here.

View raw message