lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "AnalyzersTokenizersTokenFilters" by BrookeSchreierGanz
Date Mon, 21 May 2012 19:08:36 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "AnalyzersTokenizersTokenFilters" page has been changed by BrookeSchreierGanz:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?action=diff&rev1=131&rev2=132

Comment:
added information about BeiderMorseFilterFactory (Beider-Morse Phonetic Matching), which was
added in Solr 3.6

    * default is 0
   * '''protected="protwords.txt"''' specifies a text file containing a list of words that
should be protected and passed through unchanged.
    * default is empty (no protected words)
-  * '''types="wdfftypes.txt"''' allows customized tokenization for this filter. The file
should exist in the solr/conf directory, and entries are of the form (without quotes) "% =>
ALPHA" or "\u002C => DIGIT". Allowable types are: LOWER, UPPER, ALPHA, DIGIT, ALPHANUM,
SUBWORD_DELIM. [Solr3.1] 
+  * '''types="wdfftypes.txt"''' allows customized tokenization for this filter. The file
should exist in the solr/conf directory, and entries are of the form (without quotes) "% =>
ALPHA" or "\u002C => DIGIT". Allowable types are: LOWER, UPPER, ALPHA, DIGIT, ALPHANUM,
SUBWORD_DELIM. [Solr3.1]
-   * See SOLR-2059, 
+   * See SOLR-2059,
  
  These parameters may be combined in any way.
  
@@ -624, +624 @@

  {{{
    <filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>
  }}}
+ <<Anchor(BeiderMorseFilterFactory)>>
+ 
+ === solr.BeiderMorseFilterFactory ===
+ <!> [[https://wiki.apache.org/solr/Solr3.6|Solr3.6]]
+ 
+ Creates `org.apache.solr.analysis.BeiderMorsePhoneticFilter`.
+ 
+ Uses [[http://jakarta.apache.org/commons/codec/|commons codec]] to generate phonetically
similar tokens that are optimized for surnames that sound alike but have different spellings.
 This is especially useful for Central European and Eastern European surnames.  For example,
one can use this filter factory to find documents that contain the surname "Kracovsky" when
the original search term was "Crakowski", or vice versa.  For more information, check out
the paper about Beider-Morse Phonetic Matching (BMPM) at http://stevemorse.org/phonetics/bmpm.htm.
+ 
+ {{{
+ <filter class="solr.BeiderMorseFilterFactory" nameType="GENERIC" ruleType="APPROX" concat="true"
languageSet="auto"/>
+ }}}
  <<Anchor(ShingleFilterFactory)>>
  
  === solr.ShingleFilterFactory ===

Mime
View raw message