lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: AW: Best way to match umlauts
Date Mon, 17 Jun 2013 13:17:40 GMT
And this is a key advantage of using the mapping char filter rather than the 
simple ASCII folding token filter - you can easily go in and modify the 
mappings for application/domain/environment-specific character mappings such 
as these.

-- Jack Krupansky

-----Original Message----- 
From: André Widhani
Sent: Monday, June 17, 2013 4:27 AM
To: solr-user@lucene.apache.org
Subject: AW: Best way to match umlauts

We configure both baseletter conversion (removing accents and umlauts) and 
alternate spelling through the mapping file.

For baseletter conversion and mostly german content we transform all accents 
that are not used in german language (like french é, è, ê etc.) to their 
baseletter. We do not do do this for german umlauts, because the assumption 
is that a user will know the correct spelling in his or her native language 
but probably not in foreign languages.

For alternate spelling, we use the following mapping:

  # * Alternate spelling
  #
  # Additionally, german umlauts are converted to their base form ("ä" => 
"ae"),
  # and "ß" is converted to "ss". Which means both spellings can be used to 
find
  # either one.
  #
  "\u00C4" => "AE"
  "\u00D6" => "OE"
  "\u00DC" => "UE"
  "\u00E4" => "ae"
  "\u00F6" => "oe"
  "\u00DF" => "ss"
  "\u00FC" => "ue"


André
= 


Mime
View raw message