lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: LowerCaseFilter fails one letter (I) of Turkish alphabet
Date Mon, 30 Nov 2009 19:46:22 GMT
On Mon, Nov 30, 2009 at 8:08 PM, Robert Muir <rcmuir@gmail.com> wrote:
>> I am not sure if it is worth to add a new TokenFilter for Turkish language.
>> I see there exist GreekLowerCaseFilter and RussianLowerCaseFilter. It would
>> be nice to see TurkishLowerCaseFilter in Lucene.
>>
>>
>>
> just to clarify, GreekLowerCaseFilter really shouldn't exist either. The
> final sigma problem it has (where there are two lowercase forms depending
> upon position in word), this is also solved with unicode case folding or
> collation. This is a perfect example of how lowercase is the wrong operation
> for search.
>
> and RussianLowerCaseFilter is deprecated now, it does the exact same thing
> as LowerCaseFilter.
btw. we should fix supplementary chars in there too even if it is deprecated.

>
> --
> Robert Muir
> rcmuir@gmail.com
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message