lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dyga, Adam" <adam.d...@beumergroup.com>
Subject RE: German 'ue' -> 'u' conversion
Date Mon, 19 Nov 2012 10:37:46 GMT
Yes, that would solve my question 2. I can convert all umlauts to 'ue', 'ae', etc form before
the tokens get to other filters and it should work fine.

Thanks,
Adam


-----Original Message-----
From: Igal @ getRailo.org [mailto:igal@getrailo.org] 
Sent: 19 listopada 2012 11:15
To: java-user@lucene.apache.org
Subject: Re: German 'ue' -> 'u' conversion

if your needs are so specific -- you can always build a NormalizeCharMap and use MappingCharFilter


Igal


On 11/19/2012 2:11 AM, Dyga, Adam wrote:
> I did, but none of them can do it (at least in default configuration).
>
> Regards,
> AD
>
> -----Original Message-----
> From: Igal @ getRailo.org [mailto:igal@getrailo.org]
> Sent: 19 listopada 2012 11:10
> To: java-user@lucene.apache.org
> Subject: Re: German 'ue' -> 'u' conversion
>
> look for filters that use the ICU4J library
>
>
> On 11/19/2012 2:08 AM, Lutz Fechner wrote:
>> Hi,
>>
>> we use a modified ISOLatin1AccentFilter bit to replace German accents by ae, oe,
ue and so on for that purpose.
>>
>> In the code you will see a switch for the characters.
>>
>>
>> You need to change it from
>>
>> case '\u00E4' : // small ä
>>             output[outputPos++] = 'a';
>>             output[outputPos++] = 'e';
>>             break;
>>
>> To something like this
>>
>> case '\u00E4' : // small ä
>>             output[outputPos++] = 'a';
>> 	    break;
>>
>> for the characters you want to replace.
>>
>>
>> Best Regards
>>
>> Lutz Fechner
>>
>>
>>
>>
>> -----Original Message-----
>> From: Dyga, Adam [mailto:adam.dyga@beumergroup.com]
>> Sent: Montag, 19. November 2012 10:47
>> To: java-user@lucene.apache.org
>> Subject: German 'ue' -> 'u' conversion
>>
>> Hello,
>>
>> I have two questin regarding handling German umlauts in Lucene:
>>
>> 1. I'm trying to find a way to convert German Umlauts written as 'ue', 'ae', etc
to folded form 'u', 'a' and so on.
>> This is done by GermanAnalyzer (and German2StemFilter used by it), but unfortunately
it also does stemming which is very undesired in my case.
>> Is there any other filter that can do only the 'ua' -> 'u' conversion?
>>
>> 2. Is there any filter that does 'ü' -> 'ue' (NOT 'u') conversion? What I'm trying
to achieve is that word "über" should be found in the index whenever the user searches for
" über" or "ueber" , but NOT "uber".
>>
>> Regards,
>> AD
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message