lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject RE: umlauts / diacritic expansion
Date Tue, 16 Apr 2019 18:45:52 GMT
Hello Michael,

For the case of normalizing ü to ue, take a look at the german normalizer [1].

Regards,
Markus

[1] https://lucene.apache.org/core/7_6_0/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html

 
 
-----Original message-----
> From:Ralf Heyde <ralf.heyde@gmx.de>
> Sent: Tuesday 16th April 2019 20:28
> To: java-user@lucene.apache.org
> Subject: Re: umlauts / diacritic expansion
> 
> Hey,
> 
> Take a look at Asciifoldingfilter - this one is quite generic.
> 
> Does this answer your question?
> 
> Cheers Ralf
> 
> Von meinem iPhone gesendet
> 
> > Am 16.04.2019 um 20:08 schrieb Michael Sokolov <msokolov@gmail.com>:
> > 
> > I'm learning how to index/search German today and understanding that
> > vowels with umlauts are conventionally expanded into two ASCII
> > characters, eg  "für" -> "fuer", so people may search for the expanded
> > form "fuer", but they might also search with the diacritic, and
> > finally they might lazily search using the stripped form "fur".
> > 
> > My question: is there a standard CharFilter or TokenFilter that
> > expands to both (ASCII) forms, for characters with umlauts and perhaps
> > other diacritics I might be unaware of in other languages having
> > similar multiple renderings in ASCII?
> > 
> > -Mike
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message